Connecting to Dremio Catalog from Apache Spark
You can use any Iceberg REST-compatible engine to read and write to Dremio Catalog. This page describes how to use Spark to connect to Dremio Catalog.
When using Spark, you can choose the following methods to authenticate with Dremio:
- Dremio Personal Access Token (PAT)
- OAuth2 with external IdP
You also need additional client-side work to enable Spark to properly authenticate with Dremio. These settings are discussed in the respective sections below.
Prerequisites
- Enable Dremio Personal Access Tokens (PATs).
- Configure Spark to use Iceberg 1.9+. If you can’t upgrade to 1.9, refer to the instructions on authenticating with Iceberg versions older than 1.9.
- Download and install the required JAR file from the Dremio GitHub repository.
- Add the JAR file to Spark using the
spark-sql
command with the--jars
option:
spark-sql --jars /path/to/authmgr-oauth2-runtime-0.0.1-dremio.jar
If you intend to use vended credentials, make sure to pass the following config to the spark-sql
command:
spark-sql .. --conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials
Note: Ensure that the warehouse is set to default
, as this is the warehouse used by Dremio Catalog.
Authenticating with Dremio Using Dremio PAT
Use this method if you want to use Spark with Dremio internal users. This method follows a two-step process:
1. Create a Dremio PAT
Select a user that will be used to authenticate Spark jobs and create a Dremio PAT for that user. Then, use the section below to configure Spark to use PAT.
2. Configure Spark to Use PAT to Access Dremio Catalog
Below is an example Spark configuration that would allow Spark to connect to Dremio Catalog with Iceberg REST, using a PAT for authentication:
export DREMIO_PAT=...
export DREMIO_ADDRESS=...
spark-sql \
--jars /path/to/authmgr-oauth2-runtime-0.0.1-dremio.jar \
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.polaris.cache-enabled=false \
--conf spark.sql.catalog.polaris.type=rest \
--conf spark.sql.catalog.polaris.warehouse=default \
--conf spark.sql.catalog.polaris.uri=http://$DREMIO_ADDRESS:8181/api/catalog \
--conf spark.sql.catalog.polaris.rest.auth.type=com.dremio.iceberg.authmgr.oauth2.OAuth2Manager \
--conf spark.sql.catalog.polaris.rest.auth.oauth2.token-endpoint=http://$DREMIO_ADDRESS:9047/oauth/token \
--conf spark.sql.catalog.polaris.rest.auth.oauth2.grant-type=token_exchange \
--conf spark.sql.catalog.polaris.rest.auth.oauth2.client-id=dremio \
--conf spark.sql.catalog.polaris.rest.auth.oauth2.scope=dremio.all \
--conf spark.sql.catalog.polaris.rest.auth.oauth2.token-exchange.subject-token="$DREMIO_PAT" \
--conf spark.sql.catalog.polaris.rest.auth.oauth2.token-exchange.subject-token-type=urn:ietf:params:oauth:token-type:dremio:personal-access-token
“dremio” as a Client ID is not used for actual authentication. It can be any string. DREMIO_PAT
represents the Dremio Personal Access Token (PAT).
Authenticating with Dremio Using OAuth2 (External Identity Provider)
Use this method if you want to use Spark with users defined in an external identity provider, e.g., Keycloak.
1. Configure Dremio to Use OAuth2 to Authenticate Spark
First, establish trust between Dremio and your identity provider.

- Choose “Audience” to your liking. This value is critical in disambiguating different access paths that may involve the same IdP.
- The value to set for “User Claim Mapping” depends on the IdP. It should point to the token claim that contains the value of the username that Dremio should use to map external users to internal users.
- “Issuer URL” should be the same as seen from the Spark environment (otherwise token exchange will fail).
- The OAuth configuration of the IdP should be done in a way to allow Spark clients to obtain tokens for the “Audience” configured above. In this document, we use Keycloak as an example and configure the “dremio-catalog-cli” client in Keycloak and assign a new “catalog” scope to it. Then, in the “catalog” scope we configure an “Audience” mapper to produce the custom “poc” Audience value.
2. Configure Spark to Use OAuth2
Below is an example of how you can use Spark to connect to Dremio Catalog, using an external IdP for user authentication. A summary of the process is below:
- Spark obtains a user-specific access token from an OAuth2 server (usually the IdP). Dremio requires that the token be in the form of a JWT for this use case.
- Spark connects to Dremio and exchanges the user’s IdP access token for a Dremio Access Token.
- Spark connects to Dremio Catalog using the Dremio Access Token.
export KEYCLOAK_ADDRESS=...
export DREMIO_ADDRESS=...
export CLIENT_SECRET=...
spark-sql \
--jars /path/to/authmgr-oauth2-runtime-0.0.1-dremio.jar \
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.polaris.cache-enabled=false \
--conf spark.sql.catalog.polaris.type=rest \
--conf spark.sql.catalog.polaris.warehouse=default \
--conf spark.sql.catalog.polaris.uri=http://$DREMIO_ADDRESS:8181/api/catalog \
--conf spark.sql.catalog.polaris.rest.auth.type=com.dremio.iceberg.authmgr.oauth2.OAuth2Manager \
--conf spark.sql.catalog.polaris.rest.auth.oauth2.issuer-url=http://$KEYCLOAK_ADDRESS:8080/realms/iceberg \
--conf spark.sql.catalog.polaris.rest.auth.oauth2.grant-type=device_code \
--conf spark.sql.catalog.polaris.rest.auth.oauth2.client-id=dremio \
--conf spark.sql.catalog.polaris.rest.auth.oauth2.client-secret=$CLIENT_SECRET \
--conf spark.sql.catalog.polaris.rest.auth.oauth2.scope=catalog \
--conf spark.sql.catalog.polaris.rest.auth.oauth2.impersonation.enabled=true \
--conf spark.sql.catalog.polaris.rest.auth.oauth2.impersonation.token-endpoint=http://$DREMIO_ADDRESS:9047/oauth/token \
--conf spark.sql.catalog.polaris.rest.auth.oauth2.impersonation.scope=dremio.all \
--conf spark.sql.catalog.polaris.rest.auth.oauth2.token-exchange.subject-token-type=urn:ietf:params:oauth:token-type:jwt
- The
catalog
client scope in Spark matches thecatalog
scope in Keycloak. dremio
in Spark matches the Keycloak client that has thepoc
Audience Mapper.
Using Dremio PAT for Authentication with Iceberg Versions Older Than 1.9
If you are using a version of Iceberg older than 1.9, a custom step is required to run the OAuth2 token exchange flow against Dremio in order to obtain an access token, since versions of Iceberg below 1.9 do not include AuthManager. Any OAuth2 client can be used for this. The below example uses curl
for simplicity:
export DREMIO_PAT=...
export DREMIO_ADDRESS=...
curl -X POST https://$DREMIO_ADDRESS:9047/oauth/token -d "grant_type=urn:ietf:params:oauth:grant-type:token-exchange&scope=dremio.all&subject_token_type=urn:ietf:params:oauth:token-type:dremio:personal-access-token" --data-urlencode "subject_token=$DREMIO_PAT"
Extract the access token from the output of the token exchange flow. The below examples assume the token is stored in the $DREMIO_TOKEN
variable.
- The token exchange output will also provide a token expiry period.
- It is also possible to obtain the access token via a custom IdP, but this is more challenging technically. Please contact Dremio for more information if this use case is required.
Configuring Spark to Use an OAuth Token
Below is an example Spark configuration that would allow Spark to connect to Dremio Catalog with the Iceberg REST API, using an OAuth token for authentication:
export DREMIO_TOKEN=...
export DREMIO_ADDRESS=...
spark-sql \
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.polaris.cache-enabled=false \
--conf spark.sql.catalog.polaris.type=rest \
--conf spark.sql.catalog.polaris.warehouse=default \
--conf spark.sql.catalog.polaris.uri=http://$DREMIO_ADDRESS:8181/api/catalog \
--conf spark.sql.catalog.polaris.token="$DREMIO_TOKEN"