Skip to main content

VACUUM CATALOG

Remove expired snapshots and orphaned metadata files for all Iceberg tables in the specified Arctic catalog.

For Arctic catalog sources, catalog owners and members of the ADMIN role may manually run the VACUUM CATALOG SQL command in the SQL API or the SQL Runner to start a vacuum job on demand.

note

For Arctic catalogs, Dremio automatically runs the VACUUM CATALOG SQL command if you enable table cleanup and set a cutoff policy in the Dremio console.

Before you can manually run the VACUUM CATALOG SQL command, you must complete the following tasks:

  1. Add the compute resources and data access credentials that are required to automate routine maintenance operations, including table cleanup, for the Arctic catalog. The data access credentials must permit read and write access on all tables and folders that the catalog contains.
  2. Enable table cleanup and set the number of days to use as the cutoff policy for the catalog, either in the Dremio console or with the Vacuum API.
  3. Add the Arctic catalog as a source in Sonar.
Syntax
VACUUM CATALOG <source_name> 
[ EXCLUDE ( <table_name>
[ AT { REF[ERENCE] | BRANCH | TAG | COMMIT | SNAPSHOT | TIMESTAMP } <reference_name> ]
[ AS OF <timestamp> ]
[ , ... ] ) ]

Parameters

<source_name> String

The name of the Arctic catalog source to vacuum.


<table_name> String   Optional

The name of the table that you want to exclude in the catalog source. To list multiple tables, use commas to separate the table names.


AT { REF[ERENCE] | BRANCH | TAG | COMMIT | SNAPSHOT | TIMESTAMP } <reference_name> String   Optional

Specifies the table that you want to exclude. When this parameter is omitted, the current reference is used.

<reference_name> is the name of the reference to be used with one of the following:

  • REF[ERENCE]: Identifies a specific branch, tag, or commit.
  • BRANCH: Identifies a specific branch.
  • TAG: Identifies a specific tag.
  • COMMIT: Identifies a specific commit. Commit hashes must be enclosed in double quotes. For example, “ff2fe50fef5a030c4fc8e61b252bdc33c72e2b6f929d813833d998b8368302e2”.
  • SNAPSHOT: Identifies an earlier version of a table to read.
  • TIMESTAMP: Identifies a specific timestamp.

References help identify the table. After the table is identified, the exclusion applies to this table across all branches, commits, and other references. Using a reference does not exclude the table only at that reference.


AS OF <timestamp> String   Optional

Available for Iceberg table queries only. Changes the commit reference point to the most recent Iceberg snapshot as of the provided timestamp. <timestamp> may be any SQL expression that resolves to a single timestamp type value. For example, CAST( DATE_SUB(CURRENT_DATE,1) AS TIMESTAMP ) or TIMESTAMP '2022-07-01 01:30:00.000'.

Examples

Vacuum a catalog
VACUUM CATALOG arctic_catalog;
Vacuum a catalog and exclude a table
VACUUM CATALOG arctic_catalog 
EXCLUDE (t1);
Vacuum a catalog and exclude a table referenced at a branch
VACUUM CATALOG arctic_catalog 
EXCLUDE (t1 AT BRANCH dev);
Vacuum a catalog and exclude multiple tables with one of them referenced at a branch
VACUUM CATALOG arctic_catalog 
EXCLUDE (t1 AT BRANCH dev, t2);
Vacuum a catalog and exclude multiple tables with one referenced at a tag and the other at a branch
VACUUM CATALOG arctic_catalog 
EXCLUDE (t1 AT TAG namedTag, t2 AT BRANCH dev);