Skip to main content

Open Catalog

Dremio's Open Catalog is a built-in lakehouse catalog powered by Apache Polaris. It provides centralized, secure access to your Iceberg tables while automating data maintenance to keep performance optimized.

Key Capabilities

  • Comprehensive Access Controls – Protect your data with Role-Based Access Control (RBAC) alongside fine-grained security policies. RBAC privileges are enforced within the catalog itself, providing complete privilege enforcement as the catalog is shared with other projects or engines. Apply row filters to limit data visibility by criteria such as region, or use column masks to obfuscate sensitive information such as Social Security numbers.
  • Automatic Table Maintenance – Open Catalog handles Iceberg table compaction and vacuum operations automatically, so you get optimal query performance and lower storage costs without manual intervention. These table maintenance jobs run on a dedicated engine that requires no routing rules, engine configuration, or scheduling.
  • Analyst-Friendly Data Discovery – Built-in data product capabilities make it easy for analysts to find and understand data. Use semantic search to discover datasets with natural language, leverage descriptions and labels to understand business context, and explore lineage graphs to trace data transformations and assess downstream impact.
  • Multi-Engine Compatibility – Access your catalog from any query engine or framework that supports the Iceberg REST API. Ingest data using Spark or Flink, then leverage Dremio to curate and deliver refined data products—all working from the same catalog.

Every project in your Dremio organization includes an Open Catalog by default. This catalog is automatically provisioned when you create a project and provides immediate access to your Iceberg tables with full control over security, maintenance, and data organization. Your Open Catalog is ready to use out of the box.

Create a Namespace

Namespaces help you organize tables logically within your Open Catalog. You might create namespaces by team (Engineering, Revenue), by domain (Finance, Marketing), or by use case.

To create a namespace:

  1. In the Data panel, click Add icon next to Namespaces.
  2. Enter a namespace name.
  3. Click Create.

Your namespace is now ready for tables. You can create tables within the namespace using SQL or by uploading data through the Dremio console.

Naming tips:

  • Use lowercase letters, numbers, and underscores for maximum compatibility across engines.
  • Choose descriptive names that clearly indicate the namespace's purpose (e.g., customer_analytics, finance_reporting).
  • Avoid spaces or special characters that may require escaping in SQL queries.

Observe Table Maintenance

Open Catalog automatically performs maintenance operations such as compaction and vacuum to optimize query performance. You can monitor these jobs to understand maintenance activity.

To view maintenance jobs:

  1. In the Dremio console, click Jobs icon in the side navigation bar to open the Jobs page.
  2. Select Internal job type.
  3. Review jobs with engine type MAINTENANCE.

Add Catalogs from Other Projects

In addition to your Open Catalog, you can connect to Open Catalogs from other projects in your organization. When you add a catalog from another project, it appears as a source in your project, enabling you to access shared data assets while maintaining consistent security and governance. All Role-Based Access Control (RBAC) privileges and fine-grained access controls are enforced at the catalog level, ensuring secure data access across projects.

Catalogs from other projects enable:

  • Cross-Project Collaboration – Access tables from other teams without duplicating data or managing separate copies.
  • Centralized Governance – Data owners maintain control over access policies while enabling broad data sharing.
  • Consistent Security – RBAC and fine-grained controls travel with the catalog, so permissions are enforced regardless of which project accesses the data.
  • Simplified Data Discovery – Users can browse and query shared data assets directly from their own project workspace.

To add a catalog from another project:

  1. In the Datasets panel, click Add Source icon next to Sources.
  2. In the Add Data Source dialog, under Lakehouse Catalogs, select Open Catalog.
  3. In the Name field, choose the project hosting the desired catalog from the dropdown menu.
  4. Click Save.

The catalog now appears under Lakehouse Catalogs in your Sources panel. You can browse its namespaces and query tables just as you would with your catalog, with all access controls enforced automatically.

You will only see projects in the dropdown where you have been granted access to their Open Catalog. If you do not see a project you expect, contact the project owner to request access.

Catalog Settings

The default catalog configurations work well for most use cases. If you need to adjust them:

  1. Click Settings on the left navigation bar and choose Project Settings.
  2. Select Catalog to view the catalog settings page.

For catalogs from other projects:

  1. Select the catalog from the Lakehouse Catalogs section of Sources on the Datasets panel.
  2. Select Settings from the dropdown menu.
  3. Select from the available tabs for additional configurations.

Reflection Refresh

Control how often Reflections are automatically refreshed and when they expire. These settings are specific to each project using the catalog.

Refresh Settings

  • Never refresh: Prevent automatic Reflection refresh. By default, Reflections refresh automatically.
  • Refresh every: Set the refresh interval in hours, days, or weeks. Ignored if Never refresh is selected.
  • Set refresh schedule: Specify a daily or weekly refresh schedule.

Expire Settings

  • Never expire: Prevent Reflections from expiring. By default, Reflections expire after the configured time limit.
  • Expire after: The time limit after which Reflections are removed from Dremio, specified in hours, days, or weeks. Ignored if Never expire is selected.

Metadata

Configure how Dremio handles dataset definitions and metadata refresh. These settings are specific to each project using the catalog.

In Open Catalog, metadata refresh serves two purposes:

  • Cache Refresh: Dremio maintains a project-level cache of table metadata to accelerate query planning and execution. Writes from Dremio query engines automatically update this cache. However, writes from other query engines only update snapshot metadata in object storage. Metadata refresh syncs these external changes into Dremio's cache to improve subsequent query performance.
  • Lineage Computation: Metadata refresh recomputes lineage information to reflect the latest changes in lineage graphs.

Dataset Handling

  • Remove dataset definitions if the underlying data is unavailable (Default) – When selected, Dremio removes dataset definitions if the underlying files are deleted or the folder/source becomes inaccessible. When deselected, Dremio retains dataset definitions even when data is unavailable. This is useful when files are temporarily deleted and replaced with new files.

Dataset Discovery

  • Fetch every: How often to refresh top-level source object names (databases and tables). Set the interval in minutes, hours, days, or weeks. Default: 1 hour.

Dataset Details

Metadata Dremio needs for query planning, including field information, types, shards, statistics, and locality.

  • Fetch mode: Choose to fetch metadata only from queried datasets. Dremio updates details for previously queried objects in the source. Default: Only Queried Datasets.
  • Fetch every: How often to fetch dataset details, specified in minutes, hours, days, or weeks. Default: 1 hour.
  • Expire after: When dataset details expire, specified in minutes, hours, days, or weeks. Default: 3 hours.

Privileges

Grant access to specific users or roles. See Privileges for additional information about privileges.

To grant access:

  1. Under Privileges, enter the user name or role name you want to grant access to and click Add to Privileges. The user or role appears in the USERS/ROLES table.
  2. In the USERS/ROLES table, toggle the checkbox for each privilege you want to grant.
  3. Click Save after configuring all settings.

Delete an Open Catalog Connection

To delete a catalog connection from another project:

  1. On the Datasets page, click Sources > Lakehouse Catalogs in the Data panel.
  2. In the list of data sources, hover over the source you want to remove and right-click.
  3. From the list of actions, click Delete.
  4. In the Delete Source dialog, click Delete to confirm removal.

You cannot delete your default Open Catalog as it is a core component of your project.

note

If the source is in a bad state (for example, Dremio cannot authenticate to the source or the source is otherwise unavailable), only users who belong to the ADMIN role can delete the source.