Skip to main content

Key Concepts

This page defines core Dremio concepts.

Platform

The platform provides the foundational organizational structure for Dremio. It establishes the account hierarchy through organizations and projects, and enables administrators to control user access and allocate resources.

Organizations and Projects

An organization is the top-level account within Dremio where authentication, roles, AI configuration for Model Providers, and billing are managed. An organization can contain multiple projects.

A project isolates compute, data, and resources for team-based data analysis. Projects provide the primary boundary for resource allocation and access control.

When creating a project, you can choose between Dremio-managed storage or provide your own object storage as the project store. This is where Dremio stores materializations, metadata, and Iceberg tables created in your Open Catalog.

Roles and Permissions

Roles define what actions users can perform within Dremio. Permissions control access to specific resources like projects, catalogs, tables, and views. Administrators assign roles to users to manage who can view, create, modify, or delete data objects and configurations.

Catalog

Enables unified data access across heterogeneous sources without requiring data movement or ETL processes.

Open Catalog

Dremio's Open Catalog is a metadata and data management layer built on Apache Polaris. It provides a unified namespace for organizing and accessing data across your Dremio environment with Apache Iceberg support.

Namespaces and Folders

A namespace is the top-level container within the Open Catalog that organizes data objects. The catalog name corresponds to your project name, and namespaces are the primary organizational boundary for tables, views, and folders within that catalog.

Folders are directories that contain tables, views, and other folders. Use folders to organize your data into common themes, such as data quality (raw, enrichment, and presentation layers), business units, or geographic regions. Folders can be organized hierarchically for better data governance.

Tables and Views

Tables contain data from your sources, formatted as rows and columns. Tables in the Open Catalog use the Iceberg table format, and Dremio automates maintenance processes including compaction and garbage collection.

Views are virtual tables based on SQL queries. Views do not contain data but provide logical abstractions over tables, other views, or combinations of both. Views leverage the Iceberg view specification for portability across different query engines.

Data Sources

Dremio connects to external systems through data sources without data movement. Supported sources include:

  • Iceberg Catalogs: AWS Glue Data Catalog, Snowflake Open Catalog, Unity Catalog, and Iceberg REST Catalogs
  • Object Storage: Amazon S3 and Azure Storage for data lake workloads
  • Relational Databases: PostgreSQL, MySQL, SQL Server, and other RDBMS systems

Paths

Paths are dot-separated identifiers that specify the location of an object, starting with the source or catalog name, followed by any folders, and ending with the name of the dataset, table, or view. Paths are used to qualify objects when referencing them in queries.

For example, in the path my_catalog.usage.onprem_deployment.daily_usage:

  • my_catalog is the catalog name
  • usage and onprem_deployment are folders within the catalog
  • daily_usage is the table or view name

AI Semantic Layer

Dremio provides multiple ways to discover and understand your data across all connected sources.

Wikis and Labels

Wikis provide detailed descriptions and context for your datasets, like a README for your data. Wikis support Markdown formatting and can include dataset descriptions, source information, and example queries.

Labels enable easy categorization of datasets. For example, add a PII label to indicate personally identifiable information, or Finance to group financial datasets.

Semantic search enables you to find objects and entities across your data catalog using natural language queries. It searches object names, metadata, wikis, and labels to return relevant results including sources, folders, tables, views, user-defined functions, Reflections, scripts, and jobs.

Dremio's AI Agent

Dremio's AI Agent enables natural language data exploration and analysis. You can ask questions about your data in natural language, and the AI Agent generates SQL queries and provides insights based on your datasets. The AI Agent works with data from all connected sources and can help create views and analyze patterns across your data catalog.

Query Engine

A Dremio-managed compute engine that automatically starts, scales, and stops based on query demand. Each query engine consists of one or more replicas made up of executor instances that process queries. Every project includes a default preview query engine, which remains available for essential operations and automatically scales down when idle.

Engines

An engine processes jobs that run queries issued by users (either through a client application or through the user interface) or by Dremio (as, for example, when Dremio creates a Reflection that a user has defined). Compute resources for an engine are allocated in the cloud associated with the project. All engines in a project are associated with the same cloud.

Engines are automatically started and stopped by the Dremio control plane and can be configured to have multiple replicas for scaling. For more information, see Manage Engines.

Workload Management

Workload management enables you to control how compute resources are allocated and prioritized across different types of queries and users to optimize performance for your specific workloads.

Reflections

Reflections accelerate query performance by providing precomputed and optimized copies of source data or query results. They can be Autonomous or manually managed. For more details, see Accelerate Queries.