Automatic Table Optimization
Dremio Arctic enables you to run queries against Iceberg tables using supported query engines (Spark, Dremio Sonar, Flink). To ensure queries are run efficiently, Arctic provides an automated jobs service that optimizes the storage of Iceberg tables.
This jobs automation helps you manage the accumulation of the data files that occurs through DML operations. Over time, queries gradually become less efficient because of the increased processing time required to scan multiple small files. Arctic optimizes this accumulation by combining small files into larger files.
For more details on how Dremio optimizes Iceberg tables, see How Dremio Optimizes a Table.
Configuring the Arctic Jobs Service
In order for Arctic to automatically run table optimization, you first need to set up the jobs service. This entails selecting the cloud that will run the table optimization jobs, providing an Amazon S3 bucket path where Arctic will store the log files associated with the maintenance jobs, and providing data access credentials that will allow read/write access to the tables in this Arctic catalog and to the log file location.

To configure the Arctic jobs service:
From the Arctic Catalogs (Preview) page, locate the catalog that you want to optimize and select the Catalog Settings (gear) icon.
From the Catalog Settings page, left menu bar, select Configuration.
On the Configuration page, do the following:
Under Compute resource > Cloud, select the Cloud that you want to use to run the table optimnization and cleanup jobs. If you want to add a new cloud, see Managing Clouds. For more information on Cloud set up, see Configuring Cloud Resources Manually.
For Engine size, select an available size based on the selected cloud. We recommend a small size to run your optimization jobs to keep costs lower.
Under Log file location, enter the S3 path where you want Arctic to store the metadata files associated with the maintenance operations. For example,
S3://yourBucketName/folder
.Under data access credentials, choose either Access Key or IAM Role.
If choosing Access Key, you need to provide the AWS access key ID and AWS secret access key.
If choosing IAM Role, you need to create an AWS IAM role:
The Dremio Trust Account ID and External ID must be copied and entered when you create the IAM role
Copy and provide the AWS cross-account role ARN and the AWS instance profile ARN from your AWS account.
notes:
- You can also use the Engine API to add, update, and retrieve engines that run the optimization jobs.
- When you delete an Arctic catalog, the optimization jobs service for that catalog will also be deleted.
Configuring Optimization on Tables
After the Arctic jobs service is configured, you can manually run an optimization job or set a schedule to run this job at preset times. These options work on a selected table in a selected branch, which means that the same table in another branch is not affected.

- Optimize Once: Run a one-off optimization job.
- Optimize regularly: Run an optimization job on preset schedule.
Before an optimization job runs, a check is performed to determine whether new snapshots were added to the Iceberg table since the last successful optimization job was run. If not, the optimization job is canceled.
To set up an optimization schedule:
- From the Arctic Catalogs (Preview) page, select a catalog.
- From the Catalog page, under the Data tab, locate the table that you want to run an optimization job for.
- Hover over the table and select the Settings (gear) icon.
- On the Dataset Settings dialog box > Table Optimization tab > Optimize regularly section, set up a schedule:
- In the Every field, select a frequency (hours, day, week, month).
- If selecting hours, enter a number. For example, you can run an optimization job every 6 hours.
- If selecting day, enter a time. For example, you can run an optimization job at 18:00 every day.
- If selecting week, choose the day(s) of the week and enter a time. For example, you can run an optimization job on Wednesday and Saturday at 18:00 every week.
- If selecting month, select whether you want to run an optimization job on a certain date or on a certain day of the month and enter a time. For example, you can run an optimization job on the 15th day of every month. Or you can run the job on the 2nd Saturday of every month.
- If selecting hours, enter a number. For example, you can run an optimization job every 6 hours.
- In the Every field, select a frequency (hours, day, week, month).
- Under Advanced Configuration set the following (for information about the options, see the parameters for Optimize Table):
- Under Target file size, enter a size (in MB) that will be targeted for optimization.
- Under Min. input files, enter the minimum number of qualified files needed to be considered for optimization.
- Under Min. file size, enter a minimum file size that qualifies for optimization.
- Under Max. file size, enter a maximum file size that qualifies for optimization.
You can also use the Arctic optimization APIs to run a one-off job and to run an optimization job on a preset schedule. For more information, see the Jobs API and Schedules API.
Managing Optimization Jobs
Optimization job records are stored on the Jobs page. The owners of optimization jobs can see their own job history. Owners of an Arctic catalog can see all optimization jobs for the catalog.
To view optimization jobs history:
- From the Arctic Catalogs (Preview) page, select the catalog that you want to manage.
- From the Catalog page, left menu bar, select the Jobs (
) icon.
Viewing Jobs and Job Details
All jobs run in Arctic are listed on a separate page, showing the target, status, user, and other attributes. To navigate to the Jobs page, click the Jobs () icon in the side navigation bar.
Search Filter and Columns
By default, the Jobs page lists the optimization jobs run within the last 30 days. You can filter on values and manage columns directly on the Jobs page, as shown in this image:

- Filter a free form text search bar that enables you to search jobs by table name, reference name (branch, tag, or commit), username, and job ID.
- Status represents one or more job states. For descriptions, see Job Statuses.
- Type includes Optimize and Vacuum.
note:
The vacuum service is currently not available. All jobs listed will be optimization jobs.
- User can be searched by typing the username or checking the box next to the username in the dropdown.
- List of optimization jobs:
- Target lists the tables that either are in process or have been optimized in the catalog.
note:
You can see the reference by hovering over the table name.

- Status identifies the status of the optimization job. For a description of the available statuses, see Job Statuses.
- User shows the user’s email address.
- Job Type identifies that the job is an optimization job.
- Engine Size identifies the engine that was used to run the optimization job.
- Start Time filters the jobs by the specified timeframe.
- Duration shows how long the optimization job ran for.
- Job ID shows the unique ID of the optimization job.
You can also use the Arctic optimization APIs to retrieve job status. For more information, see the Jobs API.
Job Statuses
Each optimization job passes through a sequence of states until it is complete, though the sequence can be interrupted if a query is canceled or if there is an error during a state. This table lists the statuses that the UI lets you filter on and shows how they map to the states:
Icon | Status | Description |
---|---|---|
![]() |
Setup | Represents a state where the optimization job is in process of being set up. |
![]() |
Queued | Represents a state where the optimization job is queued. |
![]() |
Running | Represents a state where the optimization job is starting up. |
![]() |
Completed | Represents a terminal state that indicates that the optimization job is successfully completed. note:
You can see the results of the completed job by hovering over the status. ![]() |
![]() |
Canceled | Represents a terminal state that indicates that the optimization job is canceled or an intervention in the system. |
![]() |
Failed | Represents a terminal state that indicates that the optimization job has failed due to an error. note:
You can see the error message for the failed job by hovering over the status. ![]() |
Troubleshooting Jobs
This section can help you troubleshoot issues that you encounter when optimization jobs do not run as expected. The Jobs page shows all the optimization jobs that have been run in a catalog, including the jobs that failed to run.
Issue: The Iceberg table cannot access the metadata location.
Solution: Verify that the data access credentials are valid and has access to the table metadata location.
Issue: The compute resources and/or data access credentials that you configured are no longer valid.
Solution: Check the configuration settings and update the compute resources and/or data access credentials. For instructions, see Configuring the Arctic Jobs Service.
Issue: The optimization job has failed due to an internal error.
Solution: You can try running the job again. If the problem persists, contact Dremio Support.
Limitations
- Dremio Consumption Units (DCU) usage reporting for automatic table optimization in Arctic catalogs is not available.