Skip to main content

Apache Spark

To connect Apache Spark to Arctic, you need to set the following properties as you initialize Spark SQL:

Configuration property nameValuePurpose
spark.sql.catalog.<catalog_name>.uriArctic Catalog EndpointEnables the query engine to know the location of the Iceberg catalog.
spark.sql.catalog.<catalog_name>.authentication.typeBEAREREnables the query engine to know the authentication type to be used for the Iceberg catalog.
spark.sql.catalog.<catalog_name>.authentication.token<personal-access-token>Enables the query engine know the authentication token to be used for the Iceberg catalog.
spark.sql.catalog.<catalog_name>.warehousePath to the Amazon S3 bucketEnables the query engine know where to create tables.

Replace the following:

  • Replace <catalog_name> with a unique value (for example, arctic or myArcticCatalog).

  • Set the authentication type to BEARER.

  • Set the authentication token to the personal access token you generated in your Dremio Cloud organization. If you have not created a personal access token, see Personal Access Tokens for information about how Dremio Cloud uses these tokens and how to generate one.

    note

    The personal access token identifies the Dremio user who can read and write data as well as the user's role-based access control (RBAC) permissions on the Arctic catalog. Dremio enforces the user's existing Arctic RBAC permissions when they read and write data through the Apache Spark engine.

  • Set the warehouse to the location (path to an on-premise or cloud storage) the query engine should use to create tables and write data.

Here is an example initialization that you can run to start up a Spark SQL session with the <catalog_name> set to arctic:

Example Initialization
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.14.1,org.projectnessie:nessie-spark-extensions:0.44.0,software.amazon.awssdk:bundle:2.17.178,software.amazon.awssdk:url-connection-client:2.17.178 \
--conf spark.sql.extensions="org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSpark32SessionExtensions" \
--conf spark.sql.catalog.arctic.uri=https://nessie.dremio.cloud/repositories/52e5d5db-f48d-4878-b429-815ge9fdw4c6/api/v1 \
--conf spark.sql.catalog.arctic.ref=main \
--conf spark.sql.catalog.arctic.authentication.type=BEARER \
--conf spark.sql.catalog.arctic.catalog-impl=org.apache.iceberg.nessie.NessieCatalog \
--conf spark.sql.catalog.arctic.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
--conf spark.sql.catalog.arctic=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.arctic.authentication.token=RDViJJHrS/u+JAwrzQVV2+kAuLxiNkbTgdWQKQhAUS72o2BMKuRWDnjuPEjACw== \
--conf spark.sql.catalog.arctic.warehouse=s3://arctic_test_bucket

After you have connected Spark to Arctic, you can try out the Getting Started with Apache Spark and Arctic tutorial to learn the basics of Arctic.

Write and Read Tables in Apache Spark

To learn about the available support for writing tables in the different versions of Spark, see Writing in the Project Nessie website.

To read tables in Apache Spark, see Reading in the Project Nessie website.