VACUUM CATALOG
Remove expired snapshots and orphaned metadata files for all Iceberg tables in the specified Arctic catalog.
For Arctic catalog sources, catalog owners and members of the ADMIN
role may manually run the VACUUM CATALOG
SQL command in the SQL API or the SQL Runner to start a vacuum job on demand.
For Arctic catalogs, Dremio automatically runs the VACUUM CATALOG
SQL command if you enable table cleanup and set a cutoff policy in the Dremio console.
Before you can manually run the VACUUM CATALOG
SQL command, you must complete the following tasks:
- Add the compute resources and data access credentials that are required to automate routine maintenance operations, including table cleanup, for the Arctic catalog. The data access credentials must permit read and write access on all tables and folders that the catalog contains.
- Enable table cleanup and set the number of days to use as the cutoff policy for the catalog, either in the Dremio console or with the Vacuum API.
- Add the Arctic catalog as a source in Sonar.
VACUUM CATALOG <source_name>
[ EXCLUDE ( <table_name>
[ AT { REF[ERENCE] | BRANCH | TAG | COMMIT | SNAPSHOT | TIMESTAMP } <reference_name> ]
[ AS OF <timestamp> ]
[ , ... ] ) ]
Parameters
<source_name> String
The name of the Arctic catalog source to vacuum.
<table_name> String Optional
The name of the table that you want to exclude in the catalog source. To list multiple tables, use commas to separate the table names.
AT { REF[ERENCE] | BRANCH | TAG | COMMIT | SNAPSHOT | TIMESTAMP } <reference_name> String Optional
Specifies the table that you want to exclude. When this parameter is omitted, the current reference is used.
<reference_name> is the name of the reference to be used with one of the following:
REF[ERENCE]
: Identifies a specific branch, tag, or commit.BRANCH
: Identifies a specific branch.TAG
: Identifies a specific tag.COMMIT
: Identifies a specific commit. Commit hashes must be enclosed in double quotes. For example,“ff2fe50fef5a030c4fc8e61b252bdc33c72e2b6f929d813833d998b8368302e2”
.SNAPSHOT
: Identifies an earlier version of a table to read.TIMESTAMP
: Identifies a specific timestamp.
References help identify the table. After the table is identified, the exclusion applies to this table across all branches, commits, and other references. Using a reference does not exclude the table only at that reference.
AS OF <timestamp> String Optional
Available for Iceberg table queries only. Changes the commit reference point to the most recent Iceberg snapshot as of the provided timestamp. <timestamp>
may be any SQL expression that resolves to a single timestamp type value. For example, CAST( DATE_SUB(CURRENT_DATE,1) AS TIMESTAMP )
or TIMESTAMP '2022-07-01 01:30:00.000'
.
Examples
Vacuum a catalogVACUUM CATALOG arctic_catalog;
VACUUM CATALOG arctic_catalog
EXCLUDE (t1);
VACUUM CATALOG arctic_catalog
EXCLUDE (t1 AT BRANCH dev);
VACUUM CATALOG arctic_catalog
EXCLUDE (t1 AT BRANCH dev, t2);
VACUUM CATALOG arctic_catalog
EXCLUDE (t1 AT TAG namedTag, t2 AT BRANCH dev);