dbt
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
You can use Dremio's dbt connector dbt-dremio
to transform data that is in data sources that are connected to a Dremio project.
note
Model contracts are not supported.
Prerequisites
- Download the
dbt-dremio
package from https://github.com/dremio/dbt-dremio. - Ensure that Python 3.9.x or later is installed.
- Before connecting from a dbt project to Dremio Cloud, follow these prerequisite steps:
- Ensure that you have the ID of the Dremio project that you want to use. See Obtaining the ID of a Project.
- Ensure that you have a personal access token (PAT) for authenticating to Dremio Cloud. See Creating a Token.
Installing
Install this package from PyPi by running this command:
Install dbt-dremio packagepip install dbt-dremio
note
dbt-dremio
works exclusively with dbt-core versions 1.2 to 1.7 If a dbt-core version below 1.2 is found, it will be updated to 1.7. If no version of dbt-core is found, version 1.7 will be automatically installed.
Initializing a dbt Project
- Run the command
dbt init <project_name>
. - Select
dremio
as the database to use. - Select one of the
dremio_cloud
option to generate a profile for your project:
Next, configure the profile for your dbt project.
Profiles
When you initialize a dbt project, you create a profile. For descriptions of the configurations in this profile, see Configurations.
Example Profile[project name]:
outputs:
dev:
cloud_host: https://api.dremio.cloud
cloud_project_id: 1ab23456-78c9-01d2-de3f-456g7h890ij1
object_storage_source: Samples
object_storage_path: "samples.dremio.com"."NYC-taxi-trips"
dremio_space: Folder1
dremio_space_folder: Folder2
pat: A1BCDrE2FwgH3IJkLM4NoPqrsT5uV6WXyza7I8bcDEFgJ9hIj0Kl1MNOPq2Rstu==
threads: 1
type: dremio
use_ssl: true
user: name@company.com
target: dev
Configurations
Configuration | Required | Default Value | Description |
---|---|---|---|
cloud_host | Yes | https://api.dremio.cloud | US Control Plane: https://api.dremio.cloud EU Control Plane: https://api.eu.dremio.cloud |
cloud_project_id | Yes | None | The ID of the Dremio project in which to run transformations. |
object_storage_path | No | no_schema | The path in the filesystem in which to create objects. The default is the root level of the filesystem. The dbt alias is root_path . Nested folders in the path are separated with periods.This value corresponds to the path in this location in the Datasets page in Dremio: |
dremio_space | No | @username | The name of the Dremio folder in which to create views. The dbt alias is database .This value corresponds to the name of the folder in the catalog section of the Datasets page in Dremio: |
dremio_space_folder | No | no_schema | The subfolder in the Dremio folder in which to create views. The dbt alias is schema . Nested folders are separated with periods (for example, Folder2.Folder3 ).This value corresponds to the path in this location in the Datasets page in Dremio: |
pat | Yes | None | The personal access token to use for authentication. See Personal Access Tokens for instructions about obtaining a token. |
threads | Yes | 1 | The number of threads the dbt project runs on. |
type | Yes | dremio | Auto-populated when creating a Dremio project. Do not change this value. |
use_ssl | Yes | true | The value must be true . |
user | Yes | None | Email address used as a username in Dremio Cloud |