Configuring Your Values to Deploy Dremio to Kubernetes
Helm is a standard for managing Kubernetes applications, and the Helm chart defines how applications are deployed to Kubernetes. Dremio's Helm chart contains the default deployment configurations, which are specified in the values.yaml
.
Dremio recommends configuring your deployment values in a separate .yaml
file since it will allow simpler updates to the latest version of the Helm chart by copying the separate configuration file across Helm chart updates.
Configuring Your Values
Skip step 1 if deploying a Free Trial. Configure your values in the values-overrides.yaml
file you downloaded using the link in the email received during the Free Trial registration.
To configure your deployment values, do the following:
-
Download the file
values-overrides.yaml
and save it locally.The
values-overrides.yaml
configuration file
# A Dremio License is required
dremio:
license: "<your license key>"
tag: 26.0.0
# To pull images from Dremio's Quay you must create a image pull secret. For more info see:
# https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
# All of the images are pulled using this same secret.
imagePullSecrets:
- <your-pull-secret-name>
coordinator:
web:
auth:
type: "internal"
tls:
enabled: false
secret: "<your-tls-secret-name>"
client:
tls:
enabled: false
secret: "<your-tls-secret-name>"
flight:
tls:
enabled: false
secret: "<your-tls-secret-name>"
volumeSize: 512Gi
resources:
limits:
memory: 64Gi
requests:
cpu: 16
memory: 60Gi
# Where Dremio stores metadata, reflections, and uploaded files.
# For more information, see https://docs.dremio.com/current/what-is-dremio/architecture#distributed-storage
distStorage:
# The supported distributed storage types are: aws, gcp, or azureStorage. For S3-compatible storage use aws.
type: <your-distributed-storage-type> # Add here your distributed storage template from http://docs.dremio.com/current/deploy-dremio/configuring-kubernetes/#configuring-the-distributed-storage
catalog:
externalAccess:
enabled: true
tls:
enabled: false
secret: "<your-catalog-tls-secret-name>"
# This is where Iceberg tables created in your catalog will reside
storage:
# The supported catalog storage types are: S3 or azure. For S3-compatible storage use S3.
type: <your-catalog-storage-type> # Add here your catalog storage template from http://docs.dremio.com/current/deploy-dremio/configuring-kubernetes/#configuring-the-catalog-storage
service:
type: LoadBalancer -
Edit the
values-overrides.yaml
file to configure your values. See the following sections for details on each configuration option:- License
- Pull Secret
- Coordinator
- Coordinator's Distributed Storage
- Dremio Catalog
- Advanced Values Configurations
importantIn all code examples,
...
denotes additional values that have been omitted.Group all values associated with a given parent key in the YAML under a single instance of that parent, for example:
DoDo notdremio:
key-one: value-one
key-two:
key-three: value-twodremio:
key-one: value-one
dremio:
key-two:
key-three: value-twoPlease note the parent relationships at the top of each YAML snippet and subsequent values throughout this section. The hierarchy of keys and indentations in YAML must be respected.
-
Save the
values-overrides.yaml
file.
Once done with the configuration, deploy Dremio to Kubernetes. See how in Deploying Dremio to Kubernetes.
License
Provide your license key. To obtain a license, see Licensing.
Add this configuration under the parent, as shown in the following example:
dremio:
license: "<license-goes-here>"
...
Pull Secret
Provide the secret used to pull the images from Quay.io. To create the Kubernetes secret, use this example:
Properties for Kubernetes secretkubectl create secret docker-registry dremio-docker-secret --docker-username=your_username --docker-password=your_password_for_username --docker-email=DOCKER_EMAIL
For more information, see Create a Secret by providing credentials on the command line (the Docker registry is quay.io
). All of the images are pulled using this same secret.
Pods can only reference image pull secrets in their own namespace, so this process needs to be done on the namespace where Dremio is being deployed.
Add this configuration under the parent, as shown in the following example:
imagePullSecrets:
- <your-k8s-secret-name>
Coordinator
Resource Configuration
Configure the volume size, resources limits, and resources requests. To configure these values, see Recommended Resources Configuration.
Add this configuration under the parents, as shown in the following example:
coordinator:
resources:
requests:
cpu: 15
memory: 30Gi
volumeSize: 100Gi
...
Identity Provider
Optionally, you can configure authentication via an identity provider. Each type of identity provider requires an additional configuration file provided during Dremio's deployment.
Select the authentication type
, and follow the corresponding link for instructions on how to create the associated configuration file:
azuread
- See how to configure Microsoft Entra ID with user and group lookup.ldap
- See how to configure Dremio for LDAP.oauth
- See how to configure Dremio for OpenID.oauth+ldap
- See how to configure Dremio for Hybrid OpenID+LDAP.
Add this configuration under the parents, as shown in the following example:
coordinator:
web:
auth:
type: <auth-type>
...
The identity provider configuration file can be embedded in your values-overrides.yaml
. To do this, use the ssoFile
option and provide the JSON content constructed per the instructions linked above. Here is an example for Microsoft Entra ID:
coordinator:
web:
auth:
enabled: true
type: "azuread"
ssoFile: |
{
"oAuthConfig": {
"clientId": "<my-client-id>",
"clientSecret": "<my-secret>",
"redirectUrl": "<my-redirect-url>",
"authorityUrl": "https://login.microsoftonline.com/<my-tenant-id>/v2.0",
"scope": "openid profile",
"jwtClaims": {
"userName": "preferred_username"
}
}
}
...
For examples for the other types, see Identity Providers
This is not the only configuration file that can be embedded inside the values-overrides.yaml
file. However, these are generally used for advanced configurations. For more information, see Additional Configuration.
Transport Level Security
Optionally enable the desired level of TLS by setting enabled: true
for client, Arrow Fight, or web TLS. To provide the TLS secret, see Creating a TLS Secret.
Add this configuration under the parents, as shown in the following example:
coordinator:
client:
tls:
enabled: false
secret: <my-tls-secret>
flight:
tls:
enabled: false
secret: <my-tls-secret>
web:
tls:
enabled: false
secret: <my-tls-secret>
...
If Web TLS is enabled, see Configuring Dremio Catalog when Coordinator Web is Using TLS.
Coordinator's Distributed Storage
This is where Dremio stores metadata, reflections, and uploaded files, and it's required for Dremio to be operational. The supported types are AWS S3 or S3-compatible storage, Azure Storage, and Google Cloud Storage (GCS). For examples of configurations, see Configuring the Distributed Storage. Add this configuration under the parent, as shown in the following example:
distStorage:
type: "<my-dist-store-type>"
...
Dremio Catalog
The configuration for Dremio Catalog has several options:
-
Configuring storage for Dremio Catalog is mandatory since this is the location where Iceberg tables created in the Catalog will be written. For configuring the storage, see Configuring Storage for Dremio Catalog.
Add this configuration under the parents, as shown in the following example:catalog:
externalAccess:
enabled: true
... -
(Optional) Use TLS for external access to require clients connecting to Dremio Catalog from outside the namespace to use TLS. To configure it, see Configuring TLS for Dremio Catalog External Access.
Add this configuration under the parents, as shown in the following example:catalog:
externalAccess:
enabled: true
tls:
enabled: false
secret: <my-catalog-tls-secret>
... -
(Optional) If Dremio coordinator web access is using TLS, additional configuration is necessary. To configure it, see Configuring Dremio Catalog When Coordinator Web Is Using TLS.
Add this configuration under the parents, as shown in the following example:catalog:
externalAccess:
enabled: true
authentication:
authServerHostname: <my-auth-server-host>
...
Save the values-overrides.yaml
file.
Once done with the configuration, deploy Dremio to Kubernetes. See how in the topic Deploying Dremio to Kubernetes.
Configuring Your Values - Advanced
Dremio Platform Images
The Dremio platform requires 18 images when running fully featured. All images are published by Dremio to our Quay and are listed below. If you want to use a private mirror of our repository, add the snippets bellow to values-overrides.yaml
to repoint to your own.
Dremio Platform Images
If creating a private mirror, use the same repository names and tags from Dremio's Quay.io.
This is important for supportability.
dremio:
image:
repository: quay.io/dremio/dremio-enterprise
tag: <The image tag from Quay.io>
busyBox:
image:
repository: quay.io/dremio/busybox
tag: <The image tag from Quay.io>
k8s:
image:
repository: quay.io/dremio/alpine/k8s
tag: <The image tag from Quay.io>
engine:
operator:
image:
repository: quay.io/dremio/dremio-engine-operator
tag: <The image tag from Quay.io>
zookeeper:
image:
repository: quay.io/dremio/zookeeper
tag: <The image tag from Quay.io>
opensearch:
image:
repository: quay.io/dremio/dremio-search-opensearch
tag: <The image tag from Quay.io> # The tag version must be a valid opensearch version as listed here https://opensearch.org/docs/latest/version-history/
preInstallJob:
image:
repository: quay.io/dremio/dremio-search-init
tag: <The image tag from Quay.io>
opensearchOperator:
manager:
image:
repository: quay.io/dremio/dremio-opensearch-operator
tag: <The image tag from Quay.io>
kubeRbacProxy:
image:
repository: quay.io/dremio/kubebuilder/kube-rbac-proxy
tag: <The image tag from Quay.io>
mongodbOperator:
image:
repository: quay.io/dremio/dremio-mongodb-operator
tag: <The image tag from Quay.io>
mongodb:
image:
repository: quay.io/dremio/percona/percona-server-mongodb
tag: <The image tag from Quay.io>
catalogservices:
image:
repository: quay.io/dremio/dremio-catalog-services-server
tag: <The image tag from Quay.io>
catalog:
image:
repository: quay.io/dremio/dremio-catalog-server
tag: <The image tag from Quay.io>
externaAccess:
image:
repository: quay.io/dremio/dremio-catalog-server-external
tag: <The image tag from Quay.io>
nats:
container:
image:
repository: quay.io/dremio/nats
tag: <The image tag from Quay.io>
reloader:
image:
repository: quay.io/dremio/natsio/nats-server-config-reloader
tag: <The image tag from Quay.io>
natsBox:
container:
image:
repository: quay.io/dremio/natsio/nats-box
tag: <The image tag from Quay.io>
telemetry:
image:
repository: quay.io/dremio/otel/opentelemetry-collector-contrib
tag: <The image tag from Quay.io>
Scale-out Coordinators
Dremio can scale to support high concurrency use cases through scaling coordinators. Multiple stateless coordinators rely on the primary coordinator to manage Dremio's state, enabling Dremio to support many more concurrent users. These scale-out coordinators are intended for high query throughput and are not applicable for standby or disaster recovery. While scale-out coordinators generally reduce the load on the primary coordinator, the primary coordinator's vCPU request should be increased for every two scale-outs added to avoid negatively impacting performance.
Perform this configuration in this section of the file, where count refers to the number of scale-outs. A count of 0 will provision only the primary coordinator:
coordinator:
count: 1
...
When using scale-out coordinators, the load balancer session affinity should be enhanced. See: Advanced Load Balancer Configuration.
Configuring Kubernetes Pod Metadata (including Node Selector)
It's possible to add metadata both globally and to each of the StatefulSets (coordinators, classic engines, ZooKeeper, etc.), including configuring a node selector for pods to use specific node pools.
Define these values with caution and foreknowledge of expected entries because any misconfiguration may result in Kubernetes being unable to schedule your pods.
Use the following options to add metadata:
-
Example of a global labellabels:
- Configured using key-value pairs as shown in the following examples:Example of StatefulSet labellabels:
foo: barcatalog:
labels:
foo: bar
...For more information on labels, see the Kubernetes documentation on Labels and Selectors.
-
Example of a global annotationannotations:
- Configured using key-value pairs as shown in the following examples.Example of a StatefulSet annotationannotations:
foo: barmongodb:
annotations:
foo: bar
...For more information on annotations, see the Kubernetes documentation on Annotations.
-
Example of a global tolerationtolerations:
- Configured using a specific structure as shown in the following examples:Example of a StatefulSet tolerationtolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"catalog:
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
...For more information on tolerations, see the Kubernetes documentation on Taints and Tolerations.
-
Example of a global node selectornodeSelector:
- Configured using a specific structure as shown in the following examples.Example of a StatefulSet node selectornodeSelector:
nodetype: coordinatorcoordinator:
nodeSelector:
nodetype: coordinator
...
To understand the structure and values to use in the configurations, expand "Metadata Structure and Values" below:
Metadata Structure and Values
For global metadata:
annotations: {}
labels: {}
tolerations: []
nodeSelector: {}
For StatefulSet metadata:
coordinator:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodetype: coordinator
executor:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodetype: coordinator
catalog:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodetype: catalog
catalogservices:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodetype: catalogservices
mongodb:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodetype: mongo
opensearch:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodetype: operators
oidcProxy:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodeType: utils
preInstallJob:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodeType: jobs
nats:
podTemplate:
merge:
spec:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodetype: nats
mongodbOperator:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodetype: operators
opensearchOperator:
annotations: {}
labels: {}
tolerations: []
nodeSelector:
nodetype: operators
Configuring Extra Environment Variables
Optionally, you can define extra environment variables to be passed to either Coordinators or Executors. This can be done by adding the configuration under the parents as shown in the following example:
coordinator:
extraEnvs:
- name: <my-variable-name>
value: "<my-variable-value>"
...
executor:
extraEnvs:
- name: <my-variable-name>
value: "<my-variable-value>"
...
Environment variables defined as shown will be applied to Executors of both Classic Engines and New Engines.
Advanced Load Balancer Configuration
Dremio will create a public load balancer by default, and the Dremio Client service will provide an external IP to connect to Dremio. For more information, see Connecting to the Dremio Console.
-
Private Cluster - For private Kubernetes clusters (no public endpoint), set
internalLoadBalancer: true
Add this configuration under the parent as shown in the following example:service:
type: LoadBalancer
internalLoadBalancer: true
... -
Static IP - To define a static IP for your load balancer, set
loadBalancerIP: <your-static-IP>
. If unset, an available IP will be assigned upon creation of the load balancer. Add this configuration under the parent as shown in the following example:service:
type: LoadBalancer
loadBalancerIP: <my-desired-ip>
...tipThis can be helpful if DNS is configured to expect Dremio to have a specific IP.
-
Session Afinity - If leveraging Scale-out Coordinators, set
sessionAffinity: true
. Add this configuration under the parent as shown in the following example:service:
type: LoadBalancer
sessionAffinity: true
...
Advanced TLS Configuration for OpenSearch
Dremio generates TLS certificates by default for OpenSearch and they are rotated monthly. However, if you want to have your own, you need to create two secrets containing the relevant certificates. The format of the secrets is different from the other TLS secrets shown on this page, and the tls.crt
, tls.key
, and ca.crt
files must be in PEM format. Use the example below as reference to create your secrets:
kubectl create secret generic opensearch-tls-certs \
--from-file=tls.crt --from-file=tls.key --from-file=ca.crt
kubectl create secret generic opensearch-tls-certs-admin \
--from-file=tls.crt --from-file=tls.key --from-file=ca.crt
Add the snippet below to the values-overrides.yaml
file before deploying Dremio. Because OpenSearch requires TLS, if certificate generation is disabled, you must provide a certificate.
opensearch:
tlsCertsSecretName: <opensearch-tls-certs>
disableTlsCertGeneration: true
...
Advanced Configuration of Engines
Dremio's default resource offset is reserve-2-8
, where the first value represents 2 vCPUs and the second represents 8 GB of RAM. If you need to change this default for your created engines, add the following snippet to values-overrides.yaml
and set the defaultOffset
to one of the configurable offsets listed below, which are available out of the box:
reserve-0-0
reserve-2-4
reserve-2-8
reserve-2-16
The listed values are keys and thus must be provided in this exact format into the snippet below.
engine:
options:
resourceAllocationOffsets:
defaultOffset: reserve-2-8
...
Configuration of Classic Engines
- You should only use classic engines if the new ones introduced in Dremio 26.0 are not appropriate for your use case. Classic and new engines are not intended to be used side by side.
- Classic engines will not auto-start/auto-stop, which is only possible with the new engines.
The classic way of configuring engines is still supported, and you can add this snippet to values-overrides.yaml
as part of the deployment. Note that this snippet is a configuration example, and you should adjust the values to your own case.
executor:
resources:
requests:
cpu: "16"
memory: "120Gi"
limits:
memory: "120Gi"
engines: ["default"]
count: 3
volumeSize: 128Gi
cloudCache:
enabled: true
volumes:
- size: 128Gi
...
Telemetry
Telemetry egress is enabled by default. These metrics provide visibility into various components and services, ensuring optimal performance and reliability. To disable egress add the following to your values-override.yaml
:
telemetry:
enabled: false
...
Disabling Parts of the Deployment
You can disable some components of the Dremio platform if their functionality does not pertain to your use case. Dremio's functionality will continue to work if any of these components described in this section are disabled.
Semantic Search
To disable Semantic Search, add this configuration under the parent as shown in the following example:
opensearch:
enabled: false
replicas: 0
Additional Configuration
Dremio has several configuration and binary files to define the behavior for enabling authentication via an identity provider, logging, connecting to Hive, etc. During the deployment, these files are combined and used to create a Kubernetes ConfigMap. This ConfigMap is, in turn, used by the Dremio deployment as the source of truth for various settings. Options can be used to embed these in the values-override.yaml
add configuration files.
To inspect Dremio's configuration files or perform a more complex operation not shown here, see Downloading Dremio's Helm Charts.
Additional Config Files
Use the configFiles
option to add configuration files into your Dremio deployment. You can add multiple files, each is a key value pair. The key is the file name and value the file content. These can be TXT, XML or JSON files. For example, here is how to embed the configuration for Hashicorp Vault followed by separate example file:
dremio:
configFiles:
vault_config.json: |
{
"vaultUrl": "https://my-vault.com",
"namespace": "optional/dremio/global/vault/namespace",
"auth": {
"kubernetes": {
"vaultRole": "dremio-vault-role",
"serviceAccountJwt": "file:///optional/custom/path/to/serviceAccount/jwt",
"loginMountPath": "optional/custom/kubernetes/login/path"
}
}
}
another_config.json: |
{
"key in this file": "content of this key"
}
...
Additional Config Variables
Use the dremioConfExtraOptions
option to add new variables to your Dremio deployment. For example, here is how to enable TLS between executors and coordinators, leveraging auto-generated self-signed certificates.
dremio:
dremioConfExtraOptions:
"services.fabric.ssl.enabled": true
"services.fabric.ssl.auto-certificate.enabled": true
...
Additional Config Binary Files
Use the configBinaries
option to provide binary configuration files (encoded as base64). For example, a JKS file for a custom truststore. The key is the file name, and the value is the file content. Add this configuration under the parents as shown in the following example:
dremio:
configBinaries:
custom-truststore.jks: "base64EncodedBinaryContent"
...
Additional Advanced Configs
Use the advancedConfigs
option to enable advanced configurations and their details. Add this configuration under the parent as shown in the following example illustrating an advanced configuration to provide a password if your custom trust store has one:
dremio:
advancedConfigs:
trustStore:
enabled: true
password: "<my-truststore-pass>"
Hive
Use the hive2ConfigFiles
option to configure Hive 2. Add this configuration under the parents as show in the following example:
dremio:
hive2ConfigFiles:
hive-site.xml: |
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<n>hive.metastore.uris</n>
<value>thrift://hive-metastore:9083</value>
</property>
</configuration>
...
Use the hive3ConfigFiles
option to configure Hive 3. Add this configuration under the parents as show in the following example:
dremio:
hive3ConfigFiles:
hive-site.xml: |
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<n>hive.metastore.uris</n>
<value>thrift://hive3-metastore:9083</value>
</property>
</configuration>
...
References
Recommended Resources Configuration
The table in this section contains the recommended values for resources requests and volume size to configure Dremio components. In the values-overrides.yaml
file, set the following values:
resources:
requests:
memory: # Put here the value from the Memory column.
cpu: # Put here the value from the CPU column.
volumeSize: # Put here the value from the Volume Size column, if any.
Dremio recommends using the Basic Configuration values for evaluation or testing purposes and adjusting them as you go towards the values in Production Configuration, which are the values Dremio recommends to operate in a production environment.
Dremio Component | Basic Configuration | Production Configuration | ||||||
---|---|---|---|---|---|---|---|---|
Memory | CPU | Volume Size | Pod Count | Memory | CPU | Volume Size | Pod Count | |
Coordinator | 8Gi | 4 | 50Gi | 1 | 64Gi | 32 | 512Gi | 1 |
Catalog Server | 8Gi | 5 | - | 1 | 8Gi | 4 | - | 1 |
Catalog Server (External) | 8Gi | 6 | - | 1 | 8Gi | 4 | - | 1 |
Catalog Service Server | 8Gi | 7 | - | 1 | 8Gi | 4 | - | 1 |
Engine Operator | 1Gi | 1 | - | 1 | 1Gi | 1 | - | 1 |
OpenSearch | 8Gi | 1500m | 10Gi | 3 | 16Gi | 2 | 100Gi | 3 |
MongoDB | 2Gi | 4 | 10Gi | 3 | 4Gi | 8 | 512Gi1 | 3 |
NATS | 1Gi | 700m | - | 3 | 1Gi | 700m | - | 3 |
ZooKeeper | 1Gi | 500m | - | 3 | 1Gi | 500m | - | 3 |
Open Telemetry | 1Gi | 1 | - | 1 | 1Gi | 1 | - | 1 |
1 You can use a smaller volume size if you do not heavily use Iceberg.
Expand the widget below for Dremio platform components resource YAML snippets:
Dremio Platform Resource Configuration YAML
coordinator:
resources:
requests:
cpu: "32"
memory: "64Gi"
limits:
memory: "64Gi"
volumeSize: "512Gi"
catalog:
requests:
cpu: "4"
memory: "8Gi"
limits:
cpu: "4"
memory: "8Gi"
catalogservices:
resources:
requests:
cpu: "4"
memory: "8Gi"
limits:
cpu: "4"
memory: "8Gi"
opensearch:
resources:
requests:
memory: "16Gi"
cpu: "2"
limits:
memory: "16Gi"
cpu: "2"
mongodb:
resources:
requests:
cpu: "2"
memory: "2Gi"
limits:
cpu: "4"
memory: "2Gi"
storage:
resources:
requests:
storage: "512Gi"
nats:
resources:
requests:
cpu: "500m"
memory: "1024Mi"
limits:
cpu: "750m"
memory: "1536Mi"
zookeeper:
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
memory: "1Gi"
volumeSize: "10Gi"
telemetry:
resources:
requests:
cpu: "1"
memory: "1Gi"
limits:
cpu: "2"
memory: "2Gi"
Creating a TLS Secret
If you have enabled TLS in your values-overrides.yaml
, the corresponding secrets must be created before deploying Dremio. To create a secret, run the following command:
kubectl create secret tls <your-tls-secret-name> --key privkey.pem --cert cert.pem
For more information, see kubectl create secret tls.
TLS for OpenSearch requires a secret of a different makeup. See Advanced TLS Configuration for OpenSearch.
Configuring the Distributed Storage
Dremio’s distributed store uses scalable and fault-tolerant storage and it is configured as follows:
-
In the
values-overrides.yaml
file, find the section withdistStorage:
andtype:
distStorage:
type: "<my-dist-store-type>"
... -
In
type:
, configure your storage provider with one of the following values:"gcp"
- For GCP Cloud Storage."aws"
- For AWS S3 or S3-compatible storage."azureStorage"
- For Azure Storage.
-
Select the tab below for the storage provider you have configured in step 2, copy the template, paste it below the line with
type:
, and configure your distributed storage values.
- Google Cloud Platform (GCP)
- AWS S3 & S3-Compatible
- Azure Storage
type: "gcp"
gcp:
bucketName: "GCS Bucket Name"
path: "/"
authentication: "auto"
# If using serviceAccountKeys, uncomment the section below, referencing the values from
# the service account credentials JSON file that you generated:
#credentials:
# projectId: GCP Project ID that the Google Cloud Storage bucket belongs to.
# clientId: Client ID for the service account that has access to the Google Cloud Storage bucket.
# clientEmail: Email for the service account that has access to the Google Cloud Storage bucket.
# privateKeyId: Private key ID for the service account that has access to Google Cloud Storage bucket.
# privateKey: |-
# -----BEGIN PRIVATE KEY-----\n Replace me with full private key value. \n-----END PRIVATE KEY-----\n
Where:
bucketName
- The name of the GCS bucket for distributed storage.path
- The path relative to the bucket to create Dremio's directories.authentication
- Valid types are:serviceAccountKeys
orauto
.- When using
auto
, Dremio uses Google Application Default Credentials to authenticate. This is platform-dependent and may not be available in all Kubernetes clusters. - When using a GCS bucket on GKE, we recommend enabling Workload Identity and configuring a Kubernetes service account for Dremio with an associated workload identity that has access to the GCS bucket.
- When using
credentials
- If usingserviceAccountKeys
authentication, uncomment the credentials section below.
type: "aws"
aws:
bucketName: "AWS Bucket Name"
path: "/"
authentication: "metadata"
# If using accessKeySecret for authentication against S3, uncomment the lines below and use the values
# to configure the appropriate credentials.
#
#credentials:
# accessKey: "AWS Access Key"
# secret: "AWS Secret"
#
# If using awsProfile for authentication against S3, uncomment the lines below and use the values
# to choose the appropriate profile.
#
#credentials:
# awsProfileName: "default"
#
# Extra Properties
# Use the extra properties block to provide additional parameters to configure the distributed
# storage in the generated core-site.xml file.
#
#extraProperties: |
# <property>
# <name>The property name</name>
# <value>The property value</value>
# </property>
Where:
bucketName
- The name of the S3 bucket for distributed storage.path
- The path relative to the bucket to create Dremio's directories.authentication
- Valid types aremetadata
,accessKeySecret
, andawsProfile
.
Metadata is only supported in AWS EKS and requires that the EKS worker node IAM role is configured with sufficient access rights. At this time, Dremio does not support using a Kubernetes service account-based IAM role.credentials
- If usingaccessKeySecret
authentication, uncomment the credentials section below.extraProperties
- Additional parameters to configure the distributed storage in the generatedcore-site.xml
file. For example:extraProperties: |
<property>
<name>fs.s3a.endpoint</name>
<value>0.0.0.0</value>
</property>
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
</property>
<property>
<name>fs.s3a.connection.ssl.enabled</name>
<value>false</value>
</property>
type: "azureStorage"
azureStorage:
accountName: "Azure Storage Account Name"
authentication: "accessKey"
filesystem: "Azure Storage Account Blob Container"
path: "/"
credentials:
# If using accessKey for authentication against Azure Storage, uncomment the lines below and use the values
# to configure the appropriate credentials.
#accessKey: "Azure Storage Account Access Key"
#
# If using entraID for authentication against Azure Storage, uncomment the lines below and use the values
# to configure the appropriate credentials.
#clientId: "Azure Application Client ID"
#tokenEndpoint: "Azure Entra ID Token Endpoint"
#clientSecret: "Azure Application Client Secret"
Where:
accountName
- The name of the storage account.authentication
- Valid types are:accessKey
orentraID
filesystem
- The name of the blob container to use within the storage account.path
- The path relative to the filesystem to create Dremio's directories.credentials
- Relevant content foraccesskey
orentraID
.
Configuring Storage for Dremio Catalog
To use Dremio Catalog, configure the storage settings based on your storage provider (for example, Amazon S3, Azure Storage, or Google Cloud Storage). This configuration is required to enable support for vended credentials and to allow access to the table metadata necessary for Iceberg table operations.
-
In the
values-overrides.yaml
file, find the section to configure your storage provider under the parents, as shown in the following example:catalog:
storage:
location: <your-object-store-path>
type: <your-object-store-type>
... -
To configure it, select the tab for your storage provider, and follow the steps:
- Amazon S3
- S3-compatible
- Azure Storage
- Google Cloud Storage
To use Dremio Catalog with Amazon S3, do the following:
-
Create an IAM user or use an existing IAM user for Dremio Catalog.
-
Create an IAM policy that grants access to your S3 location. For example:
Example of a policy{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:DeleteObject",
"s3:DeleteObjectVersion"
],
"Resource": "arn:aws:s3:::<my_bucket>/*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::<my_bucket>",
"Condition": {
"StringLike": {
"s3:prefix": [
"*"
]
}
}
}
]
} -
Create an IAM role to grant privileges to S3 location.
- In your AWS console, select Create Role.
- Enter an externalId. For example,
my_catalog_external_id
. - Attach the policy created in the previous step and create the role.
-
Create IAM user permissions to access the bucket via STS:
-
Select the IAM role created in the previous step.
-
Edit the trust policy and add the following:
Trust policy{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "<dremio_catalog_user_arn>"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "<dremio_catalog_external_id>"
}
}
}
]
}Replace the following values with the ones obtained in the previous steps:
<dremio_catalog_user_arn>
- The IAM user that was created in the first step.<dremio_catalog_external_id>
: The external id that was created in third step.
noteThe
sts:AssumeRole
permission is required for Dremio Catalog to function with vended credentials, as it relies on the STS temporary token to perform these validations.
-
-
Configure Dremio Catalog in the
values-overrides.yaml
file as follows:catalog:
storage:
location: s3://<your_bucket>/<your_folder>
type: S3
s3:
region: <bucket_region>
roleArn: <dremio_catalog_iam_role> // The role that was created in step 3
userArn: <dremio_catalog_user_arn> // The IAM user that was created in step 1
externalId: <dremio_catalog_external_id> // The external id that was created in step 3
useAccessKeys: false // Set it to true if you intend to use accessKeys. See the note below.
...noteIf your role requires AWS Secret Keys to access the bucket and STS, you must create a Kubernetes secret named
Example for Amazon S3 using an access key and a secret keycatalog-server-s3-storage-creds
to access the configured location. Below is a simple example of Amazon S3 using an access key and a secret key:export AWS_ACCESS_KEY_ID=<access-key>
export AWS_SECRET_ACCESS_KEY=<secret-key>
kubectl create secret generic catalog-server-s3-storage-creds \ --namespace $NAMESPACE \
--from-literal awsAccessKeyId=$AWS_ACCESS_KEY_ID \
--from-literal awsSecretAccessKey=$AWS_SECRET_ACCESS_KEY
Prerequisites- The access keys must have permissions to access the bucket and the STS server.
- In the Dremio console, select Master Credentials when adding Dremio Catalog.
To use Dremio Catalog with S3-compatible storage, do the following:
-
Create a Kubernetes secret named
catalog-server-s3-storage-creds
to access the configured location. Here is an example for S3 using an access key and secret key:export AWS_ACCESS_KEY_ID=<username>
export AWS_SECRET_ACCESS_KEY=<password>
kubectl create secret generic catalog-server-s3-storage-creds \
--namespace $NAMESPACE \
--from-literal awsAccessKeyId=$AWS_ACCESS_KEY_ID \
--from-literal awsSecretAccessKey=$AWS_SECRET_ACCESS_KEYFor S3-compatible storage providers (e.g., MinIO), the access keys should be the username and password.
-
For this step, select the tab for whether the S3-compatible storage has STS support or not, and follow the instructions:
- Has STS support
- No STS support
Dremio Catalog uses STS as a mechanism to perform credentials vending so, configure Dremio Catalog in the
values-overrides.yaml
file as follows:catalog:
storage:
location: s3://<your_bucket/<your_folder>
type: S3
s3:
region: <bucket_region>
roleArn: arn:aws:iam::000000000000:role/catalog-access-role // This doesn't matter, it is a dummy role.
endpoint: <s3-compatible-server-url> // This is the S3 server url, for example to MinIO http://<minio-host>:<minio-port
stsEndpoint: <s3-compatible-sts-server-url> // This is the STS server url, for example to MinIO http://<minio-host>:<minio-port
pathStyleAccess: true // Mandatory to be true
useAccessKeys: true // Mandatory to be true
...Vended credentials will not work and, in such cases, you must use "Master Credentials" in Dremio and provide explicit access keys for external engines where they are required.
Once the Kubernetes secrets for the access keys have been created, configure Dremio Catalog in the
values-overrides.yaml
file as follows:catalog:
storage:
location: s3://<your_bucket/<your_folder>
type: S3
s3:
region: <bucket_region>
roleArn: arn:aws:iam::000000000000:role/catalog-access-role // This doesn't matter, it is a dummy role.
endpoint: <s3-compatible-server-url> // This is the S3 server url, for example to MinIO http://<minio-host>:<minio-port
pathStyleAccess: true // Mandatory to be true
skipSts: true // Mandatory to be true
useAccessKeys: true // Mandatory to be true
...
To use Dremio Catalog with Azure Storage, do the following:
-
Register an application and create secrets:
-
Go to Azure Active Directory > App Registrations.
-
Register your app, and take note of the Client ID and Tenant ID. For more information on these steps, refer to Register an application with Microsoft Entra ID and create a service principal.
-
Go to Certificates & Secrets > New Client Secret.
-
Create a secret, and take note of the Secret Value.
-
Create a Kubernetes secret named
catalog-server-azure-storage-creds
using the following command:export AZURE_CLIENT_ID=<Azure App client id>
export AZURE_CLIENT_SECRET=<App secret value>
kubectl create secret generic catalog-server-azure-storage-creds \
--namespace $NAMESPACE \
--from-literal azureClientId=$AZURE_CLIENT_ID \
--from-literal azureClientSecret=$AZURE_CLIENT_SECRET
-
-
Create an IAM role in your Storage Account and set up the permission for your new application to access the storage account by following these steps:
- In the Azure console, go to your Storage Account and navigate to Access Control (IAM) > Role assignments > Add role assignment.
- Select the
Storage Blob Data Contributor
role and click Next. - In the Members section, click on Select members, search for your app registration from step 1 and click Select.
- Review and assign the roles.
-
Configure Dremio Catalog in the
values-overrides.yaml
file as follows:catalog:
storage:
location: abfss://<container_name>@<storage_account>.dfs.core.windows.net/<path>
type: azure
azure:
tenantId: <Your Azure's directory tenant Id>
multiTenantAppName: ~ // Optional: Used only if you register an app with multi-tenants.
useClientSecrets: true // Has to be true
...
To use Dremio Catalog with Google Cloud Storage (GCS), do the following:
-
Go to your Google Cloud Platform (GCP), create a service account, and grant an IAM role with the following permissions:
storage.buckets.get
storage.objects.create
storage.objects.delete
storage.objects.get
storage.objects.list -
Obtain the JSON file with the GCP credentials from the Google service account.
-
Create the Kubernetes secret where Dremio is deployed using the following command:
kubectl create secret generic catalog-server-gcs-storage-creds --from-file=<filename>.json
-
Configure Dremio Catalog in the
values-overrides.yaml
file as follows:catalog:
...
storage:
location: gs://<bucket>/<path>
type: GCS
gcs:
useCredentialsFile: True
Configuring TLS for Dremio Catalog External Access
For clients connecting to Dremio Catalog from outside the namespace, TLS can be enabled for Dremio Catalog external access as follows:
- Enable external access with TLS and provide the TLS secret. See the section Creating a TLS Secret.
- In the
values-overrides.yaml
file, find the Dremio Catalog configuration section:catalog:
... - Configure TLS for Dremio Catalog as follows:
catalog:
externalAccess:
enabled: true
tls:
enabled: true
secret: dremio-tls-secret-catalog
...
Configuring Dremio Catalog when Coordinator Web is Using TLS
When the Dremio coordinator is using TLS for Web access (i.e., when coordinator.web.tls
is set to true
), then Dremio Catalog external access must be configured appropriately, or client authentication will fail. For that, configure Dremio Catalog as follows:
-
In the
values-overrides.yaml
file, find the Dremio Catalog configuration section:catalog:
... -
Configure Dremio Catalog as follows:
catalog:
externalAccess:
enabled: true
authentication:
authServerHostname: dremio-master-0.dremio-cluster-pod.{{ .Release.Namespace }}.svc.cluster.local
...The
authServerHostname
must match the CN (or the SAN) field of the (master) coordinator Web TLS certificate.In case it does not match the CN or SAN fields of the TLS certificate, as a last resort, it is possible to disable hostname verification (
disableHostnameVerification: true
):catalog:
externalAccess:
enabled: true
authentication:
authServerHostname: dremio-master-0.dremio-cluster-pod.{{ .Release.Namespace }}.svc.cluster.local
disableHostnameVerification: true
...
Downloading Dremio's Helm Chart
You can perform more advanced configurations beyond those described in this topic. However, proceed with caution—making changes without a clear understanding may lead to unexpected or undesired behavior. To do an advanced configuration, you must pull Dremio’s Helm charts.
Pull the Helm charts using the following command:
helm pull oci://quay.io/dremio/dremio-helm:<image-tag> --untar
This will create a new directory called dremio-helm
containing the Charts.
For more information, see Helm Pull.
Overriding Additional Values
After completing the helm pull
:
- Find the
values.yaml
file, open it, and check the configurations you want to override. - Copy what you want to override from the
values.yaml
tovalues-overrides.yaml
and configure the file with your values. - Save the
values-overrides.yaml
file.
Once done with the configuration, deploy Dremio to Kubernetes via the OCI Repo. See how in Deploying Dremio to Kubernetes.
Manual Modifications to Deployment files
For modifications in these files to take effect requires installing Dremio using a local version of the Helm charts. Thus, the helm install
command must reference a local folder, not the OCI repo like Quay. For more information and sample commands, see Helm install.
After completing the helm pull
, you can edit the charts directly. This may be necessary to add deployment-specific modifications not cattered for in the Additional Configuration section. These would typically required modifications to files in the /config
directory. Any customizations to your Dremio environment are propagated to all the pods when installing or upgrading the deployment.