Skip to main content

dbt

dbt enables analytics engineers to transform their data using the same practices that software engineers use to build applications.

You can use Dremio's dbt connector dbt-dremio to transform data that is in data sources that are connected to a Dremio project.

Prerequisites

  • Download the dbt-dremio package from https://github.com/dremio/dbt-dremio.
  • Ensure that Python 3.9.x or later is installed.
  • Before connecting from a dbt project to Dremio, follow these prerequisite steps:
    • Ensure that you have the ID of the Dremio project that you want to use. See Obtain the ID of a Project.
    • Ensure that you have a personal access token (PAT) for authenticating to Dremio. See Create a PAT.

Install

Install this package from PyPi by running this command:

Install dbt-dremio package
pip install dbt-dremio
note

dbt-dremio works exclusively with dbt-core versions 1.8-1.9. Previous versions of dbt-core are outside of official support.

Initialize a dbt Project

  1. Run the command dbt init <project_name>.
  2. Select dremio as the database to use.
  3. Select the dremio_cloud option.
  4. Provide a value for cloud_host.
  5. Enter your username, PAT, and the ID of your Dremio project.
  6. Select the enterprise_catalog option.
  7. For enterprise_catalog_namespace, enter the name of an existing namespace within the catalog.
  8. For enterprise_catalog_folder, enter the name of a folder which already exists within the namespace.

For descriptions of the configurations in the above steps, see Configurations.

After these steps are completed, you will now have a profile for your new dbt project. This file will typically be named profiles.yml.

This file can be edited to add multiple profiles, one for each target configuration of Dremio. A common pattern is to have a dev target a dbt project is tested, and then another prod target where changes to the model are promoted after testing:

Example Profile
[project name]:
outputs:
dev:
cloud_host: api.dremio.cloud
cloud_project_id: 1ab23456-78c9-01d2-de3f-456g7h890ij1
enterprise_catalog_folder: sales
enterprise_catalog_namespace: dev
pat: A1BCDrE2FwgH3IJkLM4123qrsT5uV6WXyza7I8bcDEFgJ9hIj0Kl1MNOPq2Rstu==
threads: 1
type: dremio
use_ssl: true
user: name@company.com
prod:
cloud_host: api.dremio.cloud
cloud_project_id: 1ab23456-78c9-01d2-de3f-456g7h890ij1
enterprise_catalog_folder: sales
enterprise_catalog_namespace: prod
pat: A1BCDrE2FwgH3IJkLM4123qrsT5uV6WXyza7I8bcDEFgJ9hIj0Kl1MNOPq2Rstu==
threads: 1
type: dremio
use_ssl: true
user: name@company.com
target: dev

Note that the target value inside of the profiles.yml file can be overriden when invoking the dbt run.

Specify target for dbt run command
dbt run --target <target_name>

Configurations

ConfigurationRequiredDefault ValueDescription
cloud_hostYesapi.dremio.cloudUS Control Plane: api.dremio.cloud

EU Control Plane: api.eu.dremio.cloud
cloud_project_idYesNoneThe ID of the Dremio project in which to run transformations.
enterprise_catalog_namespaceYesNoneThe namespace in which to create tables, views, etc. The dbt aliases are datalake (for objects) and database (for views).
enterprise_catalog_folderYesNoneThe path in the catalog in which to create catalog objects. The dbt aliases are root_path (for objects) and schema (for views). Nested folders in the path are separated with periods.
patYesNoneThe personal access token to use for authentication. See Personal Access Tokens for instructions about obtaining a token.
threadsYes1The number of threads the dbt project runs on.
typeYesdremioAuto-populated when creating a Dremio project. Do not change this value.
use_sslYestrueThe value must be true.
userYesNoneEmail address used as a username in Dremio.

Known Limitations

Model contracts are not supported.