Skip to main content

Pillar 6: Self-Serve Semantic Layer

Dremio Cloud has a unique capability in its semantic layer, which is where the magic happens in mapping the physical structure of the underlying data storage to how the data is consumed via SQL queries. When you optimally design and maintain the semantic layer, data are more discoverable, writing queries is more straightforward, and performance is optimized.

Principles

Layer Views

Layering your views allows you to balance security, performance, and usability. Layered views help you expose the data in your physical tables to external consumption tools in the format the tools require, with proper security and performance. A well-architected semantic layer consists of three layers to organize your views: preparation, business, and application. Each layer serves a purpose in transforming data for consumption by external tools.

Annotate Datasets to Enhance Discovery and Understanding

You can label and document datasets within Dremio Cloud to make data more discoverable and verifiable and allow you to apply governance.

Best Practices

Use the Preparation Layer to Map 1-1 to Tables

The preparation layer is closest to the data source. This layer is used to organize and expose only the required datasets from the source rather than all datasets the source contains. In the preparation layer, each view is mapped to the table that it is derived from in the data source, and there are no joins to other views.

Typically, a data engineer is responsible for preparing the data in the preparation layer. The data engineer should apply column aliasing so that all downstream views can use the normalized column names. Casting column data types should also be done in the preparation layer so that all higher-level views can leverage the correct type and conversion is done only once. Data should be cleansed in the preparation layer for central management and to ensure that all downstream views use clean data. Derived columns based on existing columns should be configured in the preparation layer so that all future layers can use the new columns.

Use the Business Layer to Logically Join Datasets

The business layer provides a holistic view of all data across your Arctic catalog or folder. It is the first layer where joins among and between sources should occur. All views in the business layer must be built by either querying resources in the preparation layer or querying other resources in the same business layer.

  • Querying resources in the preparation layer: views in the business layer should start with selecting all columns from the preparation layer of that view. This is typically a 1-1 mapping between the preparation and business layer view.

  • Querying other resources in the same business layer: when joining two views together, they should be joined from the business layer representation of the view, not the preparation layer. This allows all changes made in the business layer to propagate to all joins.

Use your list of common terms to describe the key business entities in your organization, such as customer, product, and order. Typically, a data modeler works with business experts and data providers to define the views that represent the business entities.

You can create many sub-layers inside the business layer, each consisting of views for different subject areas or verticals. These views are reusable components that can and should be shared across business lines. Typically, views do not filter rows or columns in the business layer; this is deferred to the application layer.

Use the business layer to improve productivity for analytics initiatives and minimize the risk of duplicative efforts in your organization by reducing the cost of service delivery to lines of business, providing a self-service model for data engineers to quickly provision datasets, and enabling data consumers to quickly use and share datasets.

Use the Application Layer to Arrange Datasets for Consumption

Application layer views are arranged for the needs of data consumers and organizational departments. Typically, data consumers like analysts and data scientists use the views from the business layer and work directly in the application layer to create and modify views in their own dashboards.

If the application layer provides self-service access to Dremio Cloud’s semantic layer, you should expose all business layer views in the application layer at minimum. Even if the view is created by running SELECT * from BUSINESS_VIEW, it provides logical separation for security and performance improvements.

If the application layer is not for self-service but for particular applications, the views in the application layer should be built on top of those self-service views in the application layer, adding any application-specific logic. Application logic should be row filters as needed by the application. Columns can be left as-is, and the list of columns the application selects are reduced in the SQL query.

Leverage Labels to Enhance Searchability

Use Dremio Cloud’s label functionality to create and assign labels to tables and views to group related objects and enhance the discoverability of data across your organization. You can search for sets of tables and views based on a label or click on a label in the Dremio console to start a search based on it. Objects can have multiple labels so that they can belong to different logical groups.

Create Wiki Content to Describe Datasets

Use Dremio Cloud’s wiki functionality to add descriptions for Arctic catalogs, sources, folders, tables, and views. Wikis enhance understanding of data inside your organization. Wikis allow you to provide context for datasets, such as descriptions for each column, and content that helps users get started with the data, such as usage examples, notes, and points of contact for questions or issues.

Dremio wikis use GitHub-Flavored Markdown and are supported by a rich text editor.

To help eliminate the need for labor-intensive manual classification and cataloging, you can use generative AI to generate labels and wikis for your datasets. Enabling the generative AI feature in Dremio allows you to generate a detailed description of each dataset’s purpose and schema. Dremio's generative AI bases its understanding on your schema and data to produce descriptions of datasets because it can determine how the columns within the dataset relate to each other and to the dataset as a whole.