dbt
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
You can use Dremio's dbt connector dbt-dremio
to transform data that is in data sources that are connected to a Dremio project.
Model contracts are not supported.
Prerequisites
- Download the
dbt-dremio
package from https://github.com/dremio/dbt-dremio. - Ensure that Python 3.9.x or later is installed.
- Ensure that you are using Dremio Software version 22.0 or later.
- If you want to use TLS to secure the connection between dbt and Dremio Software, configure full wire encryption in your Dremio cluster. For instructions, see Configuring Wire Encryption.
Installing
Install this package from PyPi by running this command:
Install dbt-dremio packagepip install dbt-dremio
dbt-dremio
works exclusively with dbt-core versions 1.2 to 1.7 If a dbt-core version below 1.2 is found, it will be updated to 1.7. If no version of dbt-core is found, version 1.7 will be automatically installed.
Initializing a dbt Project
- Run the command
dbt init <project_name>
. - Select
dremio
as the database to use. - Select one of these options to generate a profile for your project:
software_with_username_password
for working with a Dremio Software cluster and authenticating to the cluster with a username and a passwordsoftware_with_pat
for working with a Dremio Software cluster and authenticating to the cluster with a personal access token
Next, configure the profile for your dbt project.
Profiles
When you initialize a dbt project, you create one of these three profiles. You must configure it before trying to connect to Dremio Cloud or Dremio Software.
- Profile for Dremio Software with Username/Password Authentication
- Profile for Dremio Software with Authentication Through a Personal Access Token
For descriptions of the configurations in these profiles, see Configurations.
Profile for Dremio Software with Username/Password Authentication
Example Profile[project name]:
outputs:
dev:
password: b9JtkIgI3uup9gGxxK
port: 9047
software_host: 192.0.2.0
object_storage_source: Samples
object_storage_path: "samples.dremio.com"."Dremio University"
dremio_space: Space1
dremio_space_folder: Folder1.Folder2
threads: 1
type: dremio
use_ssl: true
user: userName
target: dev
Profile for Dremio Software with Authentication Through a Personal Access Token
Example Profile[project name]:
outputs:
dev:
pat: A1BCDrE2FwgH3IJkLM4NoPqrsT5uV6WXyza7I8bcDEFgJ9hIj0Kl1MNOPq2Rstu
port: 9047
software_host: 192.0.2.0
object_storage_source: Samples
object_storage_path: "samples.dremio.com"."Dremio University"
dremio_space: Space1
dremio_space_folder: Folder1.Folder2
threads: 1
type: dremio
use_ssl: true
user: userName
target: dev
Configurations
Configuration | Required? | Default Value | Description |
---|---|---|---|
password | Yes, if you are not using the pat configuration. | None | The password of the account to use when logging into the Dremio cluster. |
pat | Yes, if you are not using the user and password configurations. | None | The personal access token to use for authenticating to Dremio. See Personal Access Tokens for instructions about obtaining a token. The use of a personal access token takes precedence if values for the three configurations user, password and pat are specified. |
port | Yes | 9047 | Port for Dremio Software cluster API endpoints. |
software_host | Yes | None | The hostname or IP address of the coordinator node of the Dremio cluster. |
object_storage_source | No | $scratch | The name of the filesystem in which to create tables, materialized views, tests, and other objects. The dbt alias is datalake .This name corresponds to the name of a source in the Object Storage section of the Datasets page in Dremio: |
object_storage_path | No | no_schema | The path in the filesystem in which to create objects. The default is the root level of the filesystem. The dbt alias is root_path . Nested folders in the path are separated with periods.This value corresponds to the path in this location in the Datasets page in Dremio: |
dremio_space | No | @<username> | The value of the Dremio space in which to create views. The dbt alias is database .This value corresponds to the name in this location in the Spaces section of the Datasets page in Dremio: |
dremio_space_folder | No | no_schema | The folder in the Dremio space in which to create views. The default is the top level in the space. The dbt alias is schema . Nested folders are separated with periods.This value corresponds to the path in this location in the Datasets page in Dremio: |
threads | Yes | 1 | The number of threads the dbt project runs on. |
type | Yes | dremio | Auto-populated when creating a Dremio project. Do not change this value. |
use_ssl | Yes | true | Acceptable values are true and false . If the value is set to true, ensure that full wire encryption is configured in your Dremio cluster. See Prerequisites for Dremio Software. |
user | Yes | None | The username of the account to use when logging into the Dremio cluster. |