dbt
dbt enables analytics engineers to transform their data using the same practices that software engineers use to build applications.
You can use Dremio's dbt connector dbt-dremio to transform data that is in data sources that are connected to a Dremio project.
Prerequisites
- Download the
dbt-dremiopackage from https://github.com/dremio/dbt-dremio. - Ensure that Python 3.9.x or later is installed.
- Before connecting from a dbt project to Dremio, follow these prerequisite steps:
- Ensure that you have the ID of the Dremio project that you want to use. See Obtain the ID of a Project.
- Ensure that you have a personal access token (PAT) for authenticating to Dremio. See Create a PAT.
Install
Install this package from PyPi by running this command:
Install dbt-dremio packagepip install dbt-dremio
dbt-dremio works exclusively with dbt-core versions 1.8-1.9. Previous versions of dbt-core are outside of official support.
Initialize a dbt Project
- Run the command
dbt init <project_name>. - Select
dremioas the database to use. - Select the
dremio_cloudoption. - Provide a value for
cloud_host. - Enter your username, PAT, and the ID of your Dremio project.
- Select the
enterprise_catalogoption. - For
enterprise_catalog_namespace, enter the name of an existing namespace within the catalog. - For
enterprise_catalog_folder, enter the name of a folder which already exists within the namespace.
For descriptions of the configurations in the above steps, see Configurations.
After these steps are completed, you will now have a profile for your new dbt project. This file will typically be named profiles.yml.
This file can be edited to add multiple profiles, one for each target configuration of Dremio.
A common pattern is to have a dev target a dbt project is tested, and then another prod target where changes to the model are promoted after testing:
[project name]:
outputs:
dev:
cloud_host: api.dremio.cloud
cloud_project_id: 1ab23456-78c9-01d2-de3f-456g7h890ij1
enterprise_catalog_folder: sales
enterprise_catalog_namespace: dev
pat: A1BCDrE2FwgH3IJkLM4123qrsT5uV6WXyza7I8bcDEFgJ9hIj0Kl1MNOPq2Rstu==
threads: 1
type: dremio
use_ssl: true
user: name@company.com
prod:
cloud_host: api.dremio.cloud
cloud_project_id: 1ab23456-78c9-01d2-de3f-456g7h890ij1
enterprise_catalog_folder: sales
enterprise_catalog_namespace: prod
pat: A1BCDrE2FwgH3IJkLM4123qrsT5uV6WXyza7I8bcDEFgJ9hIj0Kl1MNOPq2Rstu==
threads: 1
type: dremio
use_ssl: true
user: name@company.com
target: dev
Note that the target value inside of the profiles.yml file can be overriden when invoking the dbt run.
dbt run --target <target_name>
Configurations
| Configuration | Required | Default Value | Description |
|---|---|---|---|
cloud_host | Yes | api.dremio.cloud | US Control Plane: api.dremio.cloudEU Control Plane: api.eu.dremio.cloud |
cloud_project_id | Yes | None | The ID of the Dremio project in which to run transformations. |
enterprise_catalog_namespace | Yes | None | The namespace in which to create tables, views, etc. The dbt aliases are datalake (for objects) and database (for views). |
enterprise_catalog_folder | Yes | None | The path in the catalog in which to create catalog objects. The dbt aliases are root_path (for objects) and schema (for views). Nested folders in the path are separated with periods. |
pat | Yes | None | The personal access token to use for authentication. See Personal Access Tokens for instructions about obtaining a token. |
threads | Yes | 1 | The number of threads the dbt project runs on. |
type | Yes | dremio | Auto-populated when creating a Dremio project. Do not change this value. |
use_ssl | Yes | true | The value must be true. |
user | Yes | None | Email address used as a username in Dremio. |
Known Limitations
Model contracts are not supported.