Apache Spark
To connect Apache Spark to Arctic, you need to set the following properties as you initialize Spark SQL:
- AWS
- Azure
Configuration property name | Value | Purpose |
---|---|---|
spark.sql.catalog.<catalog_name>.uri | Arctic Catalog Endpoint | Enables the query engine to know the location of the Iceberg catalog. |
spark.sql.catalog.<catalog_name>.authentication.type | BEARER | Enables the query engine to know the authentication type to be used for the Iceberg catalog. |
spark.sql.catalog.<catalog_name>.authentication.token | <personal-access-token> | Enables the query engine know the authentication token to be used for the Iceberg catalog. |
spark.sql.catalog.<catalog_name>.warehouse | Path to the Amazon S3 bucket | Enables the query engine know where to create tables. |
Configuration property name | Value | Purpose |
---|---|---|
spark.sql.catalog.<catalog_name>.uri | Arctic Catalog Endpoint | Enables the query engine to know the location of the Iceberg catalog. |
spark.sql.catalog.<catalog_name>.authentication.type | BEARER | Enables the query engine to know the authentication type to be used for the Iceberg catalog. |
spark.sql.catalog.<catalog_name>.authentication.token | <personal-access-token> | Enables the query engine know the authentication token to be used for the Iceberg catalog. |
spark.sql.catalog.<catalog_name>.warehouse | Path to the Azure Storage account | Enables the query engine know where to create tables. |
Replace the following:
-
Replace
<catalog_name>
with a unique value (for example,arctic
ormyArcticCatalog
). -
Set the authentication type to
BEARER
. -
Set the authentication token to the personal access token you generated in your Dremio Cloud organization. If you have not created a personal access token, see Personal Access Tokens for information about how Dremio Cloud uses these tokens and how to generate one.
noteThe personal access token identifies the Dremio user who can read and write data as well as the user's role-based access control (RBAC) permissions on the Arctic catalog. Dremio enforces the user's existing Arctic RBAC permissions when they read and write data through the Apache Spark engine.
-
Set the warehouse to the location (path to an on-premise or cloud storage) the query engine should use to create tables and write data.
Here is an example initialization that you can run to start up a Spark SQL session with the <catalog_name>
set to arctic
:
- AWS
- Azure
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.14.1,org.projectnessie:nessie-spark-extensions:0.44.0,software.amazon.awssdk:bundle:2.17.178,software.amazon.awssdk:url-connection-client:2.17.178 \
--conf spark.sql.extensions="org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSpark32SessionExtensions" \
--conf spark.sql.catalog.arctic.uri=https://nessie.dremio.cloud/repositories/52e5d5db-f48d-4878-b429-815ge9fdw4c6/api/v1 \
--conf spark.sql.catalog.arctic.ref=main \
--conf spark.sql.catalog.arctic.authentication.type=BEARER \
--conf spark.sql.catalog.arctic.catalog-impl=org.apache.iceberg.nessie.NessieCatalog \
--conf spark.sql.catalog.arctic.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
--conf spark.sql.catalog.arctic=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.arctic.authentication.token=RDViJJHrS/u+JAwrzQVV2+kAuLxiNkbTgdWQKQhAUS72o2BMKuRWDnjuPEjACw== \
--conf spark.sql.catalog.arctic.warehouse=s3://arctic_test_bucket
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.14.1,org.projectnessie:nessie-spark-extensions:0.44.0,org.apache.hadoop:hadoop-azure:3.3.6,com.microsoft.azure:azure-storage-blob:12.24.0 \
--conf spark.sql.extensions="org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSpark32SessionExtensions" \
--conf spark.sql.catalog.arctic.uri=https://nessie.dremio.cloud/v1/repositories/52e5d5db-f48d-4878-b429-815ge9fdw4c6 \
--conf spark.sql.catalog.arctic.ref=main \
--conf spark.sql.catalog.arctic.authentication.type=BEARER \
--conf spark.sql.catalog.arctic.catalog-impl=org.apache.iceberg.nessie.NessieCatalog \
--conf spark.sql.catalog.arctic.io-impl=org.apache.iceberg.azure.adlsv2.ADLSFileIO \
--conf spark.sql.catalog.arctic=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.arctic.authentication.token=RDViJJHrS/u+JAwrzQVV2+kAuLxiNkbTgdWQKQhAUS72o2BMKuRWDnjuPEjACw== \
--conf spark.sql.catalog.arctic.warehouse=wasbs://arctic_test_container@arctic-storage-account.blob.core.windows.net/arcticdata/
After you have connected Spark to Arctic, you can try out the Getting Started with Apache Spark and Arctic tutorial to learn the basics of Arctic.
Write and Read Tables in Apache Spark
To learn about the available support for writing tables in the different versions of Spark, see Writing in the Project Nessie website.
To read tables in Apache Spark, see Reading in the Project Nessie website.