Skip to main content
Version: current [26.x]

Configuring Your Values to Deploy Dremio to Kubernetes

Helm is a standard for managing Kubernetes applications, and the Helm chart defines how applications are deployed to Kubernetes. Dremio's Helm chart contains the default deployment configurations, which are specified in the values.yaml.

Dremio recommends configuring your deployment values in a separate .yaml file since it will allow simpler updates to the latest version of the Helm chart by copying the separate configuration file across Helm chart updates.

Configuring Your Values

FREE TRIAL

Skip step 1 if deploying a Free Trial. Configure your values in the values-overrides.yaml file you downloaded using the link in the email received during the Free Trial registration.

To configure your deployment values, do the following:

  1. Download the file values-overrides.yaml and save it locally.

    The values-overrides.yaml configuration file

    # A Dremio License is required
    dremio:
    license: "<your license key>"
    tag: 26.0.0

    # To pull images from Dremio's Quay you must create a image pull secret. For more info see:
    # https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
    # All of the images are pulled using this same secret.
    imagePullSecrets:
    - <your-pull-secret-name>

    coordinator:
    web:
    auth:
    type: "internal"
    tls:
    enabled: false
    secret: "<your-tls-secret-name>"
    client:
    tls:
    enabled: false
    secret: "<your-tls-secret-name>"
    flight:
    tls:
    enabled: false
    secret: "<your-tls-secret-name>"
    volumeSize: 512Gi
    resources:
    limits:
    memory: 64Gi
    requests:
    cpu: 16
    memory: 60Gi

    # Where Dremio stores metadata, reflections, and uploaded files.
    # For more information, see https://docs.dremio.com/current/what-is-dremio/architecture#distributed-storage
    distStorage:
    # The supported distributed storage types are: aws, gcp, or azureStorage. For S3-compatible storage use aws.
    type: <your-distributed-storage-type> # Add here your distributed storage template from http://docs.dremio.com/current/deploy-dremio/configuring-kubernetes/#configuring-the-distributed-storage

    catalog:
    externalAccess:
    enabled: true
    tls:
    enabled: false
    secret: "<your-catalog-tls-secret-name>"
    # This is where Iceberg tables created in your catalog will reside
    storage:
    # The supported catalog storage types are: S3 or azure. For S3-compatible storage use S3.
    type: <your-catalog-storage-type> # Add here your catalog storage template from http://docs.dremio.com/current/deploy-dremio/configuring-kubernetes/#configuring-the-catalog-storage

    service:
    type: LoadBalancer

  2. Edit the values-overrides.yaml file to configure your values. See the following sections for details on each configuration option:

    important

    In all code examples, ... denotes additional values that have been omitted.

    Group all values associated with a given parent key in the YAML under a single instance of that parent, for example:

    Do
    dremio:
    key-one: value-one
    key-two:
    key-three: value-two
    Do not
    dremio:
    key-one: value-one

    dremio:
    key-two:
    key-three: value-two

    Please note the parent relationships at the top of each YAML snippet and subsequent values throughout this section. The hierarchy of keys and indentations in YAML must be respected.

  3. Save the values-overrides.yaml file.

Once done with the configuration, deploy Dremio to Kubernetes. See how in Deploying Dremio to Kubernetes.

License

Provide your license key. To obtain a license, see Licensing.
Add this configuration under the parent, as shown in the following example:

dremio:
license: "<license-goes-here>"
...

Pull Secret

Provide the secret used to pull the images from Quay.io. To create the Kubernetes secret, use this example:

Properties for Kubernetes secret
kubectl create secret docker-registry dremio-docker-secret --docker-username=your_username --docker-password=your_password_for_username --docker-email=DOCKER_EMAIL

For more information, see Create a Secret by providing credentials on the command line (the Docker registry is quay.io). All of the images are pulled using this same secret.

note

Pods can only reference image pull secrets in their own namespace, so this process needs to be done on the namespace where Dremio is being deployed.

Add this configuration under the parent, as shown in the following example:

imagePullSecrets:
- <your-k8s-secret-name>

Coordinator

Resource Configuration

Configure the volume size, resources limits, and resources requests. To configure these values, see Recommended Resources Configuration.

Add this configuration under the parents, as shown in the following example:

coordinator:
resources:
requests:
cpu: 15
memory: 30Gi
volumeSize: 100Gi
...

Identity Provider

Optionally, you can configure authentication via an identity provider. Each type of identity provider requires an additional configuration file provided during Dremio's deployment.

Select the authentication type, and follow the corresponding link for instructions on how to create the associated configuration file:

Add this configuration under the parents, as shown in the following example:

coordinator:
web:
auth:
type: <auth-type>
...

The identity provider configuration file can be embedded in your values-overrides.yaml. To do this, use the ssoFile option and provide the JSON content constructed per the instructions linked above. Here is an example for Microsoft Entra ID:

coordinator:
web:
auth:
enabled: true
type: "azuread"
ssoFile: |
{
"oAuthConfig": {
"clientId": "<my-client-id>",
"clientSecret": "<my-secret>",
"redirectUrl": "<my-redirect-url>",
"authorityUrl": "https://login.microsoftonline.com/<my-tenant-id>/v2.0",
"scope": "openid profile",
"jwtClaims": {
"userName": "preferred_username"
}
}
}
...

For examples for the other types, see Identity Providers

This is not the only configuration file that can be embedded inside the values-overrides.yaml file. However, these are generally used for advanced configurations. For more information, see Additional Configuration.

Transport Level Security

Optionally enable the desired level of TLS by setting enabled: true for client, Arrow Fight, or web TLS. To provide the TLS secret, see Creating a TLS Secret.

Add this configuration under the parents, as shown in the following example:

coordinator:
client:
tls:
enabled: false
secret: <my-tls-secret>
flight:
tls:
enabled: false
secret: <my-tls-secret>
web:
tls:
enabled: false
secret: <my-tls-secret>
...

Coordinator's Distributed Storage

This is where Dremio stores metadata, reflections, and uploaded files, and it's required for Dremio to be operational. The supported types are AWS S3 or S3-compatible storage, Azure Storage, and Google Cloud Storage (GCS). For examples of configurations, see Configuring the Distributed Storage. Add this configuration under the parent, as shown in the following example:

distStorage:
type: "<my-dist-store-type>"
...

Dremio Catalog

The configuration for Dremio Catalog has several options:

  • Configuring storage for Dremio Catalog is mandatory since this is the location where Iceberg tables created in the Catalog will be written. For configuring the storage, see Configuring Storage for Dremio Catalog.
    Add this configuration under the parents, as shown in the following example:

    catalog:
    externalAccess:
    enabled: true
    ...
  • (Optional) Use TLS for external access to require clients connecting to Dremio Catalog from outside the namespace to use TLS. To configure it, see Configuring TLS for Dremio Catalog External Access.
    Add this configuration under the parents, as shown in the following example:

    catalog:
    externalAccess:
    enabled: true
    tls:
    enabled: false
    secret: <my-catalog-tls-secret>
    ...
  • (Optional) If Dremio coordinator web access is using TLS, additional configuration is necessary. To configure it, see Configuring Dremio Catalog When Coordinator Web Is Using TLS.
    Add this configuration under the parents, as shown in the following example:

    catalog:
    externalAccess:
    enabled: true
    authentication:
    authServerHostname: <my-auth-server-host>
    ...

Save the values-overrides.yaml file.

Once done with the configuration, deploy Dremio to Kubernetes. See how in the topic Deploying Dremio to Kubernetes.

Configuring Your Values - Advanced

Dremio Platform Images

The Dremio platform requires 18 images when running fully featured. All images are published by Dremio to our Quay and are listed below. If you want to use a private mirror of our repository, add the snippets bellow to values-overrides.yaml to repoint to your own.

Dremio Platform Images
note

If creating a private mirror, use the same repository names and tags from Dremio's Quay.io.
This is important for supportability.

dremio:
image:
repository: quay.io/dremio/dremio-enterprise
tag: <The image tag from Quay.io>
busyBox:
image:
repository: quay.io/dremio/busybox
tag: <The image tag from Quay.io>
k8s:
image:
repository: quay.io/dremio/alpine/k8s
tag: <The image tag from Quay.io>
engine:
operator:
image:
repository: quay.io/dremio/dremio-engine-operator
tag: <The image tag from Quay.io>
zookeeper:
image:
repository: quay.io/dremio/zookeeper
tag: <The image tag from Quay.io>
opensearch:
image:
repository: quay.io/dremio/dremio-search-opensearch
tag: <The image tag from Quay.io> # The tag version must be a valid opensearch version as listed here https://opensearch.org/docs/latest/version-history/
preInstallJob:
image:
repository: quay.io/dremio/dremio-search-init
tag: <The image tag from Quay.io>
opensearchOperator:
manager:
image:
repository: quay.io/dremio/dremio-opensearch-operator
tag: <The image tag from Quay.io>
kubeRbacProxy:
image:
repository: quay.io/dremio/kubebuilder/kube-rbac-proxy
tag: <The image tag from Quay.io>
mongodbOperator:
image:
repository: quay.io/dremio/dremio-mongodb-operator
tag: <The image tag from Quay.io>
mongodb:
image:
repository: quay.io/dremio/percona/percona-server-mongodb
tag: <The image tag from Quay.io>
catalogservices:
image:
repository: quay.io/dremio/dremio-catalog-services-server
tag: <The image tag from Quay.io>
catalog:
image:
repository: quay.io/dremio/dremio-catalog-server
tag: <The image tag from Quay.io>
externaAccess:
image:
repository: quay.io/dremio/dremio-catalog-server-external
tag: <The image tag from Quay.io>
nats:
container:
image:
repository: quay.io/dremio/nats
tag: <The image tag from Quay.io>
reloader:
image:
repository: quay.io/dremio/natsio/nats-server-config-reloader
tag: <The image tag from Quay.io>
natsBox:
container:
image:
repository: quay.io/dremio/natsio/nats-box
tag: <The image tag from Quay.io>
telemetry:
image:
repository: quay.io/dremio/otel/opentelemetry-collector-contrib
tag: <The image tag from Quay.io>

Scale-out Coordinators

Dremio can scale to support high concurrency use cases through scaling coordinators. Multiple stateless coordinators rely on the primary coordinator to manage Dremio's state, enabling Dremio to support many more concurrent users. These scale-out coordinators are intended for high query throughput and are not applicable for standby or disaster recovery. While scale-out coordinators generally reduce the load on the primary coordinator, the primary coordinator's vCPU request should be increased for every two scale-outs added to avoid negatively impacting performance.

Perform this configuration in this section of the file, where count refers to the number of scale-outs. A count of 0 will provision only the primary coordinator:

coordinator:
count: 1
...
note

When using scale-out coordinators, the load balancer session affinity should be enhanced. See: Advanced Load Balancer Configuration.

Configuring Kubernetes Pod Metadata (including Node Selector)

It's possible to add metadata both globally and to each of the StatefulSets (coordinators, classic engines, ZooKeeper, etc.), including configuring a node selector for pods to use specific node pools.

warning

Define these values with caution and foreknowledge of expected entries because any misconfiguration may result in Kubernetes being unable to schedule your pods.

Use the following options to add metadata:

  • labels: - Configured using key-value pairs as shown in the following examples:

    Example of a global label
    labels:
    foo: bar
    Example of StatefulSet label
    catalog:
    labels:
    foo: bar
    ...

    For more information on labels, see the Kubernetes documentation on Labels and Selectors.

  • annotations: - Configured using key-value pairs as shown in the following examples.

    Example of a global annotation
    annotations:
    foo: bar
    Example of a StatefulSet annotation
    mongodb:
    annotations:
    foo: bar
    ...

    For more information on annotations, see the Kubernetes documentation on Annotations.

  • tolerations: - Configured using a specific structure as shown in the following examples:

    Example of a global toleration
    tolerations:
    - key: "key1"
    operator: "Equal"
    value: "value1"
    effect: "NoSchedule"
    Example of a StatefulSet toleration
    catalog:
    tolerations:
    - key: "key1"
    operator: "Equal"
    value: "value1"
    effect: "NoSchedule"
    ...

    For more information on tolerations, see the Kubernetes documentation on Taints and Tolerations.

  • nodeSelector: - Configured using a specific structure as shown in the following examples.

    Example of a global node selector
    nodeSelector:
    nodetype: coordinator
    Example of a StatefulSet node selector
    coordinator:
    nodeSelector:
    nodetype: coordinator
    ...

To understand the structure and values to use in the configurations, expand "Metadata Structure and Values" below:

Metadata Structure and Values

For global metadata:

annotations: {}
labels: {}
tolerations: []
nodeSelector: {}

For StatefulSet metadata:

coordinator:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodetype: coordinator
executor:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodetype: coordinator
catalog:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodetype: catalog
catalogservices:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodetype: catalogservices
mongodb:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodetype: mongo
opensearch:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodetype: operators
oidcProxy:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodeType: utils
preInstallJob:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodeType: jobs
nats:
podTemplate:
merge:
spec:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodetype: nats
mongodbOperator:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodetype: operators
opensearchOperator:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodetype: operators

Configuring Extra Environment Variables

Optionally, you can define extra environment variables to be passed to either Coordinators or Executors. This can be done by adding the configuration under the parents as shown in the following example:

coordinator:
extraEnvs:
- name: <my-variable-name>
value: "<my-variable-value>"
...

executor:
extraEnvs:
- name: <my-variable-name>
value: "<my-variable-value>"
...

Environment variables defined as shown will be applied to Executors of both Classic Engines and New Engines.

Advanced Load Balancer Configuration

Dremio will create a public load balancer by default, and the Dremio Client service will provide an external IP to connect to Dremio. For more information, see Connecting to the Dremio Console.

  • Private Cluster - For private Kubernetes clusters (no public endpoint), set internalLoadBalancer: true Add this configuration under the parent as shown in the following example:

    service:
    type: LoadBalancer
    internalLoadBalancer: true
    ...
  • Static IP - To define a static IP for your load balancer, set loadBalancerIP: <your-static-IP>. If unset, an available IP will be assigned upon creation of the load balancer. Add this configuration under the parent as shown in the following example:

    service:
    type: LoadBalancer
    loadBalancerIP: <my-desired-ip>
    ...
    tip

    This can be helpful if DNS is configured to expect Dremio to have a specific IP.

  • Session Afinity - If leveraging Scale-out Coordinators, set sessionAffinity: true. Add this configuration under the parent as shown in the following example:

    service:
    type: LoadBalancer
    sessionAffinity: true
    ...

Advanced TLS Configuration for OpenSearch

Dremio generates TLS certificates by default for OpenSearch and they are rotated monthly. However, if you want to have your own, you need to create two secrets containing the relevant certificates. The format of the secrets is different from the other TLS secrets shown on this page, and the tls.crt, tls.key, and ca.crt files must be in PEM format. Use the example below as reference to create your secrets:

kubectl create secret generic opensearch-tls-certs \
--from-file=tls.crt --from-file=tls.key --from-file=ca.crt

kubectl create secret generic opensearch-tls-certs-admin \
--from-file=tls.crt --from-file=tls.key --from-file=ca.crt

Add the snippet below to the values-overrides.yaml file before deploying Dremio. Because OpenSearch requires TLS, if certificate generation is disabled, you must provide a certificate.

opensearch:
tlsCertsSecretName: <opensearch-tls-certs>
disableTlsCertGeneration: true
...

Advanced Configuration of Engines

Dremio's default resource offset is reserve-2-8, where the first value represents 2 vCPUs and the second represents 8 GB of RAM. If you need to change this default for your created engines, add the following snippet to values-overrides.yaml and set the defaultOffset to one of the configurable offsets listed below, which are available out of the box:

  • reserve-0-0
  • reserve-2-4
  • reserve-2-8
  • reserve-2-16

The listed values are keys and thus must be provided in this exact format into the snippet below.

engine:
options:
resourceAllocationOffsets:
defaultOffset: reserve-2-8
...

Configuration of Classic Engines

note
  • You should only use classic engines if the new ones introduced in Dremio 26.0 are not appropriate for your use case. Classic and new engines are not intended to be used side by side.
  • Classic engines will not auto-start/auto-stop, which is only possible with the new engines.

The classic way of configuring engines is still supported, and you can add this snippet to values-overrides.yaml as part of the deployment. Note that this snippet is a configuration example, and you should adjust the values to your own case.

An example of a configuration of a classic engine
executor:
resources:
requests:
cpu: "16"
memory: "120Gi"
limits:
memory: "120Gi"
engines: ["default"]
count: 3
volumeSize: 128Gi
cloudCache:
enabled: true
volumes:
- size: 128Gi
...

Telemetry

Telemetry egress is enabled by default. These metrics provide visibility into various components and services, ensuring optimal performance and reliability. To disable egress add the following to your values-override.yaml:

telemetry:
enabled: false
...

Disabling Parts of the Deployment

You can disable some components of the Dremio platform if their functionality does not pertain to your use case. Dremio's functionality will continue to work if any of these components described in this section are disabled.

To disable Semantic Search, add this configuration under the parent as shown in the following example:

opensearch:
enabled: false
replicas: 0

Additional Configuration

Dremio has several configuration and binary files to define the behavior for enabling authentication via an identity provider, logging, connecting to Hive, etc. During the deployment, these files are combined and used to create a Kubernetes ConfigMap. This ConfigMap is, in turn, used by the Dremio deployment as the source of truth for various settings. Options can be used to embed these in the values-override.yaml add configuration files.

To inspect Dremio's configuration files or perform a more complex operation not shown here, see Downloading Dremio's Helm Charts.

Additional Config Files

Use the configFiles option to add configuration files into your Dremio deployment. You can add multiple files, each is a key value pair. The key is the file name and value the file content. These can be TXT, XML or JSON files. For example, here is how to embed the configuration for Hashicorp Vault followed by separate example file:

dremio:
configFiles:
vault_config.json: |
{
"vaultUrl": "https://my-vault.com",
"namespace": "optional/dremio/global/vault/namespace",
"auth": {
"kubernetes": {
"vaultRole": "dremio-vault-role",
"serviceAccountJwt": "file:///optional/custom/path/to/serviceAccount/jwt",
"loginMountPath": "optional/custom/kubernetes/login/path"
}
}
}
another_config.json: |
{
"key in this file": "content of this key"
}
...

Additional Config Variables

Use the dremioConfExtraOptions option to add new variables to your Dremio deployment. For example, here is how to enable TLS between executors and coordinators, leveraging auto-generated self-signed certificates.

dremio:
dremioConfExtraOptions:
"services.fabric.ssl.enabled": true
"services.fabric.ssl.auto-certificate.enabled": true
...

Additional Config Binary Files

Use the configBinaries option to provide binary configuration files (encoded as base64). For example, a JKS file for a custom truststore. The key is the file name, and the value is the file content. Add this configuration under the parents as shown in the following example:

dremio:  
configBinaries:
custom-truststore.jks: "base64EncodedBinaryContent"
...

Additional Advanced Configs

Use the advancedConfigs option to enable advanced configurations and their details. Add this configuration under the parent as shown in the following example illustrating an advanced configuration to provide a password if your custom trust store has one:

dremio:
advancedConfigs:
trustStore:
enabled: true
password: "<my-truststore-pass>"

Hive

Use the hive2ConfigFiles option to configure Hive 2. Add this configuration under the parents as show in the following example:

dremio:
hive2ConfigFiles:
hive-site.xml: |
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<n>hive.metastore.uris</n>
<value>thrift://hive-metastore:9083</value>
</property>
</configuration>
...

Use the hive3ConfigFiles option to configure Hive 3. Add this configuration under the parents as show in the following example:

dremio:
hive3ConfigFiles:
hive-site.xml: |
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<n>hive.metastore.uris</n>
<value>thrift://hive3-metastore:9083</value>
</property>
</configuration>
...

References

The table in this section contains the recommended values for resources requests and volume size to configure Dremio components. In the values-overrides.yaml file, set the following values:

  resources:
requests:
memory: # Put here the value from the Memory column.
cpu: # Put here the value from the CPU column.
volumeSize: # Put here the value from the Volume Size column, if any.

Dremio recommends using the Basic Configuration values for evaluation or testing purposes and adjusting them as you go towards the values in Production Configuration, which are the values Dremio recommends to operate in a production environment.

Dremio ComponentBasic ConfigurationProduction Configuration
MemoryCPUVolume SizePod CountMemoryCPUVolume SizePod Count
Coordinator8Gi450Gi164Gi32512Gi1
Catalog Server8Gi5-18Gi4-1
Catalog Server (External)8Gi6-18Gi4-1
Catalog Service Server8Gi7-18Gi4-1
Engine Operator1Gi1-11Gi1-1
OpenSearch8Gi1500m10Gi316Gi2100Gi3
MongoDB2Gi410Gi34Gi8512Gi13
NATS1Gi700m-31Gi700m-3
ZooKeeper1Gi500m-31Gi500m-3
Open Telemetry1Gi1-11Gi1-1

1 You can use a smaller volume size if you do not heavily use Iceberg.

Expand the widget below for Dremio platform components resource YAML snippets:

Dremio Platform Resource Configuration YAML
Coordinator
coordinator:
resources:
requests:
cpu: "32"
memory: "64Gi"
limits:
memory: "64Gi"
volumeSize: "512Gi"
Catalog Server
catalog:
requests:
cpu: "4"
memory: "8Gi"
limits:
cpu: "4"
memory: "8Gi"
Catalog Service Server
catalogservices:
resources:
requests:
cpu: "4"
memory: "8Gi"
limits:
cpu: "4"
memory: "8Gi"
OpenSearch
opensearch:
resources:
requests:
memory: "16Gi"
cpu: "2"
limits:
memory: "16Gi"
cpu: "2"
MongoDB
mongodb:
resources:
requests:
cpu: "2"
memory: "2Gi"
limits:
cpu: "4"
memory: "2Gi"
storage:
resources:
requests:
storage: "512Gi"
NATS
nats:
resources:
requests:
cpu: "500m"
memory: "1024Mi"
limits:
cpu: "750m"
memory: "1536Mi"
ZooKeeper
zookeeper:
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
memory: "1Gi"
volumeSize: "10Gi"
Open Telemetry
telemetry:
resources:
requests:
cpu: "1"
memory: "1Gi"
limits:
cpu: "2"
memory: "2Gi"

Creating a TLS Secret

If you have enabled TLS in your values-overrides.yaml, the corresponding secrets must be created before deploying Dremio. To create a secret, run the following command:

kubectl create secret tls <your-tls-secret-name> --key privkey.pem --cert cert.pem

For more information, see kubectl create secret tls.

caution

TLS for OpenSearch requires a secret of a different makeup. See Advanced TLS Configuration for OpenSearch.

Configuring the Distributed Storage

Dremio’s distributed store uses scalable and fault-tolerant storage and it is configured as follows:

  1. In the values-overrides.yaml file, find the section with distStorage: and type:

    distStorage:
    type: "<my-dist-store-type>"
    ...
  2. In type:, configure your storage provider with one of the following values:

    • "gcp" - For GCP Cloud Storage.
    • "aws" - For AWS S3 or S3-compatible storage.
    • "azureStorage" - For Azure Storage.
  3. Select the tab below for the storage provider you have configured in step 2, copy the template, paste it below the line with type:, and configure your distributed storage values.

type: "gcp"
gcp:
bucketName: "GCS Bucket Name"
path: "/"
authentication: "auto"
# If using serviceAccountKeys, uncomment the section below, referencing the values from
# the service account credentials JSON file that you generated:
#credentials:
# projectId: GCP Project ID that the Google Cloud Storage bucket belongs to.
# clientId: Client ID for the service account that has access to the Google Cloud Storage bucket.
# clientEmail: Email for the service account that has access to the Google Cloud Storage bucket.
# privateKeyId: Private key ID for the service account that has access to Google Cloud Storage bucket.
# privateKey: |-
# -----BEGIN PRIVATE KEY-----\n Replace me with full private key value. \n-----END PRIVATE KEY-----\n

Where:

  • bucketName - The name of the GCS bucket for distributed storage.
  • path - The path relative to the bucket to create Dremio's directories.
  • authentication - Valid types are: serviceAccountKeys or auto.
    • When using auto, Dremio uses Google Application Default Credentials to authenticate. This is platform-dependent and may not be available in all Kubernetes clusters.
    • When using a GCS bucket on GKE, we recommend enabling Workload Identity and configuring a Kubernetes service account for Dremio with an associated workload identity that has access to the GCS bucket.
  • credentials - If using serviceAccountKeys authentication, uncomment the credentials section below.

Configuring Storage for Dremio Catalog

To use Dremio Catalog, configure the storage settings based on your storage provider (for example, Amazon S3, Azure Storage, or Google Cloud Storage). This configuration is required to enable support for vended credentials and to allow access to the table metadata necessary for Iceberg table operations.

  1. In the values-overrides.yaml file, find the section to configure your storage provider under the parents, as shown in the following example:

    catalog:
    storage:
    location: <your-object-store-path>
    type: <your-object-store-type>
    ...
  2. To configure it, select the tab for your storage provider, and follow the steps:

    To use Dremio Catalog with Amazon S3, do the following:

    1. Create an IAM user or use an existing IAM user for Dremio Catalog.

    2. Create an IAM policy that grants access to your S3 location. For example:

      Example of a policy
      {
      "Version": "2012-10-17",
      "Statement": [
      {
      "Effect": "Allow",
      "Action": [
      "s3:PutObject",
      "s3:GetObject",
      "s3:GetObjectVersion",
      "s3:DeleteObject",
      "s3:DeleteObjectVersion"
      ],
      "Resource": "arn:aws:s3:::<my_bucket>/*"
      },
      {
      "Effect": "Allow",
      "Action": [
      "s3:ListBucket",
      "s3:GetBucketLocation"
      ],
      "Resource": "arn:aws:s3:::<my_bucket>",
      "Condition": {
      "StringLike": {
      "s3:prefix": [
      "*"
      ]
      }
      }
      }
      ]
      }
    3. Create an IAM role to grant privileges to S3 location.

      1. In your AWS console, select Create Role.
      2. Enter an externalId. For example, my_catalog_external_id.
      3. Attach the policy created in the previous step and create the role.
    4. Create IAM user permissions to access the bucket via STS:

      1. Select the IAM role created in the previous step.

      2. Edit the trust policy and add the following:

        Trust policy
        {
        "Version": "2012-10-17",
        "Statement": [
        {
        "Sid": "",
        "Effect": "Allow",
        "Principal": {
        "AWS": "<dremio_catalog_user_arn>"
        },
        "Action": "sts:AssumeRole",
        "Condition": {
        "StringEquals": {
        "sts:ExternalId": "<dremio_catalog_external_id>"
        }
        }
        }
        ]
        }

        Replace the following values with the ones obtained in the previous steps:

        • <dremio_catalog_user_arn> - The IAM user that was created in the first step.
        • <dremio_catalog_external_id>: The external id that was created in third step.
        note

        The sts:AssumeRole permission is required for Dremio Catalog to function with vended credentials, as it relies on the STS temporary token to perform these validations.

    5. Configure Dremio Catalog in the values-overrides.yaml file as follows:

      catalog:
      storage:
      location: s3://<your_bucket>/<your_folder>
      type: S3
      s3:
      region: <bucket_region>
      roleArn: <dremio_catalog_iam_role> // The role that was created in step 3
      userArn: <dremio_catalog_user_arn> // The IAM user that was created in step 1
      externalId: <dremio_catalog_external_id> // The external id that was created in step 3
      useAccessKeys: false // Set it to true if you intend to use accessKeys. See the note below.
      ...
      note

      If your role requires AWS Secret Keys to access the bucket and STS, you must create a Kubernetes secret named catalog-server-s3-storage-creds to access the configured location. Below is a simple example of Amazon S3 using an access key and a secret key:

      Example for Amazon S3 using an access key and a secret key
      export AWS_ACCESS_KEY_ID=<access-key> 
      export AWS_SECRET_ACCESS_KEY=<secret-key>
      kubectl create secret generic catalog-server-s3-storage-creds \ --namespace $NAMESPACE \
      --from-literal awsAccessKeyId=$AWS_ACCESS_KEY_ID \
      --from-literal awsSecretAccessKey=$AWS_SECRET_ACCESS_KEY

Configuring TLS for Dremio Catalog External Access

For clients connecting to Dremio Catalog from outside the namespace, TLS can be enabled for Dremio Catalog external access as follows:

  1. Enable external access with TLS and provide the TLS secret. See the section Creating a TLS Secret.
  2. In the values-overrides.yaml file, find the Dremio Catalog configuration section:
    catalog:
    ...
  3. Configure TLS for Dremio Catalog as follows:
    catalog:
    externalAccess:
    enabled: true
    tls:
    enabled: true
    secret: dremio-tls-secret-catalog
    ...

Configuring Dremio Catalog when Coordinator Web is Using TLS

When the Dremio coordinator is using TLS for Web access (i.e., when coordinator.web.tls is set to true), then Dremio Catalog external access must be configured appropriately, or client authentication will fail. For that, configure Dremio Catalog as follows:

  1. In the values-overrides.yaml file, find the Dremio Catalog configuration section:

    catalog:
    ...
  2. Configure Dremio Catalog as follows:

    catalog:
    externalAccess:
    enabled: true
    authentication:
    authServerHostname: dremio-master-0.dremio-cluster-pod.{{ .Release.Namespace }}.svc.cluster.local
    ...

    The authServerHostname must match the CN (or the SAN) field of the (master) coordinator Web TLS certificate.

    In case it does not match the CN or SAN fields of the TLS certificate, as a last resort, it is possible to disable hostname verification (disableHostnameVerification: true):

    catalog:
    externalAccess:
    enabled: true
    authentication:
    authServerHostname: dremio-master-0.dremio-cluster-pod.{{ .Release.Namespace }}.svc.cluster.local
    disableHostnameVerification: true
    ...

Downloading Dremio's Helm Chart

You can perform more advanced configurations beyond those described in this topic. However, proceed with caution—making changes without a clear understanding may lead to unexpected or undesired behavior. To do an advanced configuration, you must pull Dremio’s Helm charts.

Pull the Helm charts using the following command:

helm pull oci://quay.io/dremio/dremio-helm:<image-tag> --untar

This will create a new directory called dremio-helm containing the Charts.

For more information, see Helm Pull.

Overriding Additional Values

After completing the helm pull:

  1. Find the values.yaml file, open it, and check the configurations you want to override.
  2. Copy what you want to override from the values.yaml to values-overrides.yaml and configure the file with your values.
  3. Save the values-overrides.yaml file.

Once done with the configuration, deploy Dremio to Kubernetes via the OCI Repo. See how in Deploying Dremio to Kubernetes.

Manual Modifications to Deployment files

important

For modifications in these files to take effect requires installing Dremio using a local version of the Helm charts. Thus, the helm install command must reference a local folder, not the OCI repo like Quay. For more information and sample commands, see Helm install.

After completing the helm pull, you can edit the charts directly. This may be necessary to add deployment-specific modifications not cattered for in the Additional Configuration section. These would typically required modifications to files in the /config directory. Any customizations to your Dremio environment are propagated to all the pods when installing or upgrading the deployment.