On this page

    Apache Spark preview

    To connect Apache Spark to Arctic, you need to set the following properties as you initialize Spark SQL:

    Configuration property name Value Purpose
    spark.sql.catalog.<catalog_name>.uri Arctic Catalog Endpoint Enables the query engine to know the location of the Iceberg catalog.
    spark.sql.catalog.<catalog_name>.authentication.type BEARER Enables the query engine to know the authentication type to be used for the Iceberg catalog.
    spark.sql.catalog.<catalog_name>.authentication.token <personal-access-token> Enables the query engine know the authentication token to be used for the Iceberg catalog.
    spark.sql.catalog.<catalog_name>.warehouse path to the Amazon S3 bucket Enables the query engine know where to create tables.

    Replace the following:

    • Replace <catalog_name> with a unique value (for example, arctic or myArcticCatalog).
    • Set the authentication type to BEARER.
    • Set the authentication token to the personal access token you generated in your Dremio Cloud organization. If you have not created a personal access token, see Personal Access Tokens for information about how Dremio Cloud uses these tokens and how to generate one.
    • Set the warehouse to the location (path to an on-premise or cloud storage) the query engine should use to create tables and write data.

    Here is an example initialization that you can run to start up a Spark SQL session with the <catalog_name> set to arctic:

    Example Initialization
    spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.14.1,org.projectnessie:nessie-spark-extensions:0.44.0,software.amazon.awssdk:bundle:2.17.178,software.amazon.awssdk:url-connection-client:2.17.178 \
    --conf spark.sql.extensions="org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSpark32SessionExtensions" \
    --conf spark.sql.catalog.arctic.uri=https://nessie.dremio.cloud/v1/repositories/52e5d5db-f48d-4878-b429-815ge9fdw4c6 \
    --conf spark.sql.catalog.arctic.ref=main \
    --conf spark.sql.catalog.arctic.authentication.type=BEARER \
    --conf spark.sql.catalog.arctic.catalog-impl=org.apache.iceberg.nessie.NessieCatalog \
    --conf spark.sql.catalog.arctic.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
    --conf spark.sql.catalog.arctic=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.arctic.authentication.token=RDViJJHrS/u+JAwrzQVV2+kAuLxiNkbTgdWQKQhAUS72o2BMKuRWDnjuPEjACw== \
    --conf spark.sql.catalog.arctic.warehouse=s3://arctic_test_bucket

    After you have connected Spark to Arctic, you can try out the Getting Started with Apache Spark and Arctic tutorial to learn the basics of Arctic.

    Write and Read Tables in Apache Spark

    To learn about the available support for writing tables in the different versions of Spark, see Writing in the Project Nessie website.

    To read tables in Apache Spark, see Reading in the Project Nessie website.