Clean Metadata
This topic provides usage information for the dremio-admin clean
CLI command.
Requirements
-
Perform a backup before running the command (see Backup for more information).
-
Shut down all cluster nodes completely before running the command (see Startup/Shutdown for more information).
Syntax
Clean command syntaxdremio-admin clean <options>
You must specify at least one option. If you do not specify any options, the command opens the metadata store, but does not perform any operations. In this case, it just returns the message No operation requested.
Options
Clean command options -c, --compact
compact kvstore
Default: false
-o, --delete-orphans
delete orphans records in kvstore (e.g., old splits)
Default: false
-h, --help
show usage
-j, --max-job-days
delete jobs, profiles, and temporary dataset versions older than provided number of days
Default: 2147483647
-i, --reindex-data
reindex data
Default: false
-p, --delete-orphan-profiles
remove orphaned jobs
Default: false
-d, --delete-orphan-datasetversions
delete dataset versions older than the provided number of days
Default: 2147483647
If you do not specify any options, the output of the dremio-admin clean
command is a report of metadata store statistics.
Examples
This section provides examples for using the dremio-admin clean
command.
Compact Metadata
Compacts metadata store entries.
Compact metadata store entriesdremio-admin clean -c
dremio-admin clean --compact
Delete Orphaned Entries
Deletes orphaned metadata store entries.
Delete orphaned metadata store entriesdremio-admin clean -o
dremio-admin clean --delete-orphans
Delete Jobs
Deletes jobs, profiles, and temporary dataset versions older than the specified threshold days. If no threshold is specified, items older than the default number of days (2147483647) will be deleted. Using the default threshold will effectively not delete anything. It is recommended that you use a reasonable value (e.g., 7, 14, 30, etc.).
Delete jobsdremio-admin clean -j=7
dremio-admin clean --max-job-days=7
Re-index Data
Re-index data.
Re-index datadremio-admin clean -i
dremio-admin clean --reindex-data
Delete Orphaned Profiles
Delete orphaned Dremio job profiles.
Delete orphaned job profilesdremio-admin clean -p
dremio-admin clean --delete-orphan-profiles
Delete Orphaned Dataset Versions
This command is available in Dremio 19.6.3+, 19.8.0+, 20.4.0+, 21.2.0+, and 22.0.0+.
Deletes dataset versions that Dremio is not using that are older than the specified threshold days. If no threshold is specified, dataset versions older than the default number of days (2147483647) will be deleted. Using the default threshold will effectively not delete anything. It is recommended that you use a reasonable value (e.g., 7, 14, 30, etc.).
Delete orphaned dataset versionsdremio-admin clean -d=7
dremio-admin clean --delete-orphan-datasetversions=7
Multiple Options
Running individual clean commands with a single option per command makes it easier to inspect the impact of each action. However, you can run the clean command with more than one option at a time.
For example, the following command compacts metadata, deletes jobs older than 7 days, and deletes orphaned dataset versions older than 7 days:
Use multiple optionsdremio-admin clean -c -j=7 -d=7
dremio-admin clean --compact --max-job-days=7 --delete-orphan-datasetversions=7
Report Metadata Statistics
If you do not specify any options, the output of the clean
command is statistics about the metadata store. The statistics include estimated key count, estimated total in-memory size, and total file size for different categories of objects in the store. Running the clean
command without options is a read operation and will not clean metadata.
dremio-admin clean