Skip to main content

Vacuum

Dremio Arctic can run vacuum (table cleanup) jobs to remove expire snapshots and orphaned metadata files for Iceberg tables. This API allows you to add, retrieve, modify, and delete cutoff policies and enable and disable schedules for Arctic catalog vacuum jobs.

Vacuum Object
{
"defaultCutoffPolicy": "P5D",
"isVacuumScheduled": true
}

Vacuum Attributes

defaultCutoffPolicy String

Duration string that describes the cutoff policy for files that Dremio removes when the VACUUM CATALOG command runs on an Arctic catalog. The duration string starts with P and is followed by the duration value and D to indicate days. For example, P1D means 1 day, P30D means 30 days, and P90D means 90 days. Dremio propagates the defaultCutoffPolicy value to the Nessie garbage collector configuration. For more information, read Enabling Table Cleanup and Setting the Cutoff Policy.

Example: P5D


isVacuumScheduled Boolean

Indicates whether an automatic VACUUM CATALOG schedule is enabled for the specified catalog. For more information, read Enabling Table Cleanup and Setting the Cutoff Policy.

Example: true

Creating a Cutoff Policy and Vacuum Schedule

Create a cutoff policy and enable or disable the vacuum schedule for the specified Arctic catalog.

Method and URL
POST /v0/arctic/catalogs/{catalogId}/engine/vacuum

Parameters

catalogId Path   String (UUID)

Unique identifier for the Arctic catalog whose VACUUM CATALOG configuration you wish to create.

Example: d34bd884-4d19-fe37-ac42-e45443b8234c


defaultCutoffPolicy Body   String

Duration string that describes the cutoff policy for files that Dremio removes when the VACUUM CATALOG command runs on an Arctic catalog. The duration string starts with P and is followed by the duration value and D to indicate days. For example, P1D means 1 day, P30D means 30 days, and P90D means 90 days. Dremio propagates the defaultCutoffPolicy value you provide to the Nessie garbage collector configuration.

Example: P5D


isVacuumScheduled Body   Boolean

To enable table cleanup so that Dremio automatically runs the VACUUM CATALOG command on the specified Arctic catalog once per day at 00:00 UTC, set to true. Otherwise, set to false. If you set isVacuumScheduled to false and provide a valid value for defaultCutoffPolicy in the same request, Dremio propagates the defaultCutoffPolicy value to the Nessie garbage collector configuration and disables the VACUUM CATALOG schedule for the specified Arctic catalog. If you want Dremio to automatically run VACUUM CATALOG at a different interval and time than daily at 00:00 UTC, use the Arctic Catalog Schedules API to create or modify the VACUUM CATALOG schedule.

Example: true

Example Request
curl -X POST 'https://api.dremio.cloud/v0/arctic/catalogs/d34bd884-4d19-fe37-ac42-e45443b8234c/engine/vacuum'
--header 'Authorization: Bearer <PersonalAccessToken>' \
--header "Content-Type: application/json' \
--data-raw '{
"defaultCutoffPolicy": "P5D",
"isVacuumScheduled": "true"
}'

Example Response

A successful request to create a vacuum cutoff policy and schedule returns an empty response with the HTTP 200 OK status response code.

Response Status Codes

200   OK

400   Bad Request

404   Not Found

500   Internal Server Error

Retrieving a Vacuum Cutoff Policy and Schedule

Retrieve the vacuum cutoff policy and schedule configuration for the specified Arctic catalog.

Method and URL
GET /v0/arctic/catalogs/{catalogId}/engine/vacuum

Parameters

catalogId Path   String (UUID)

Unique identifier for the Arctic catalog whose VACUUM CATALOG configuration you wish to retrieve.

Example: d34bd884-4d19-fe37-ac42-e45443b8234c

Example Request
curl -X POST 'https://api.dremio.cloud/v0/arctic/catalogs/d34bd884-4d19-fe37-ac42-e45443b8234c/engine/vacuum'
--header 'Authorization: Bearer <PersonalAccessToken>' \
--header "Content-Type: application/json
Example Response
{
"defaultCutoffPolicy": "P5D",
"isVacuumScheduled": true
}

Response Status Codes

200   OK

404   Not Found

500   Internal Server Error