Skip to main content

Wikis and Labels

Wikis and labels help users document, organize, and discover datasets within the Open Catalog. This page explains how to manage wikis and labels, as well as how Dremio’s Generative AI features can assist in generating wikis and labels for you.

Wikis

Wikis for datasets provide an efficient way to document and describe datasets within the Open Catalog. These wikis enable users to add comprehensive information, context, and relevant details about the datasets they manage. With a user-friendly, rich text editor, the wikis support Github-flavored markdown, allowing users to format content easily and enhance readability. Wikis ensure that dataset documentation is both accessible and structured, making it simpler for teams to understand the datasets and how to work with them effectively.

This image shows an example of the Wiki editor in Dremio.

Manage Wikis

note

Ensure you have sufficient Role-Based Access Control (RBAC) privileges to view or edit wikis.

To view or edit the wiki for a dataset in the Dremio console:

  1. On the Datasets page, navigate to the folder where your dataset is stored.
  2. Hover over your dataset, and on the right-hand side, click the This is the icon that represents more actions. icon.
  3. Click Open Details Panel.
    • You can edit the dataset wiki by clicking Edit Wiki, writing your wiki content, and clicking Save.

Labels

Labels for datasets offer a powerful way to organize and retrieve datasets within a data catalog. By creating and assigning labels to datasets, users can easily search and filter through large collections related datasets. Labels also enhance the search experience, allowing users to quickly locate datasets associated with a specific label. By clicking on a label, users can initiate a search that brings up all datasets linked to that label, streamlining the process of finding relevant data and improving overall data management.

The following image shows a dataset in the catalog with several label and a brief wiki. In this example, the label "pii-data" was used in the search field to narrow down on a customer dataset that contains Personally Identifiable Information (PII).

This image shows an example of creating labels.

Manage Labels

note

Ensure you have sufficient Role-Based Access Control (RBAC) privileges to view or edit labels.

To view or edit the labels for a dataset in the Dremio console:

  1. On the Datasets page, navigate to the folder where your dataset is stored.
  2. Hover over your dataset, and on the right-hand side, click the This is the icon that represents more actions. icon.
  3. Click Open Details Panel.
    • You can add a label by clicking on the icon, typing a label name (e.g. PII), and clicking Enter.

Generate Labels and Wikis Preview

To help eliminate the need for manual profiling and cataloging, you can use Generative AI to generate labels and wikis for your datasets.

note

If you haven't opted into the Generative AI features, see Dremio Preferences for the steps on how to enable.

Generate Labels

In order to generate a label, Generative AI bases its understanding on your schema by considering other labels that have been previously generated and labels that have been created by other users.

To generate labels:

  1. Navigate to either the Details page or Details Panel of a dataset.

  2. In the Dataset Overview on the right, click This is the icon that represents Generative AI. to generate labels.

  3. In the Generating labels dialog, review the labels generated for the dataset and decide which to save. If multiple labels have been generated, you can save some, all, or none of them. To remove, simply click the x on the label.

This screenshot is showing how to generate a label.
  1. Complete one of the following actions:

    • If these are the only labels for your dataset, click Save.

    • If you already have labels for the dataset and want to add these generated labels, click Append.

    • If you already have labels for the dataset and want to replace them with these generated labels, click Overwrite.

    The labels for the dataset will appear in the Dataset Overview.

Generate Wikis

In order to generate a wiki, Generative AI bases its understanding on your schema and data to produce descriptions of datasets, because it can determine how the columns within the dataset relate to each other and to the dataset as a whole.

You can generate wikis only if you are the dataset owner or have ALTER privileges on the dataset.

To generate a wiki:

  1. Navigate to either the Details page or Details Panel of a dataset.

  2. In the Wiki section, click Generate wiki. A dialog will open and a preview of the wiki content will generate on the right of the dialog. If you would like to regenerate, click .

This screenshot is showing how to generate wikis.
  1. Click to copy the generated wiki content on the right of the dialog.

  2. Click within the text box on the left and paste the wiki content.

  3. (Optional) Use the toolbar to make edits to the wiki content. If you would like to regenerate, click This is the icon that represents Generative AI. in the toolbar to regenerate wiki content in the preview.

  4. Click Save.

The wiki for the dataset will appear in the Wiki section.