Skip to main content

Google BigLake CatalogPreview

Google BigLake is an Iceberg lakehouse built on top of the Google Cloud Storage platform.

Dremio creates connections to Google BigLake using its Iceberg REST Catalog connector.

Connect to a Google BigLake Catalog

  1. In the Dremio console, click Add Data on the Home page.
  2. In the Add Data dialog, select Iceberg REST Catalog.
  3. Configure the connection using the sections below, then click Save.

General

To configure the source connection:

  1. For Name, enter a name for the source. The name you enter must be unique in the organization. Also, consider a name that is easy for users to reference. This name cannot be edited once the source is created. The name cannot exceed 255 characters and must contain only the following characters: 0-9, A-Z, a-z, underscore (_), or hyphen (-).
  2. For Endpoint URI, specify the catalog service URI as https://biglake.googleapis.com/iceberg/v1/restcatalog.
  3. By default, Use vended credentials is enabled. This allows Dremio to connect to the catalog and receive temporary credentials to the underlying storage location. If this is enabled, you do not need to add storage authentication in Advanced Options.
  4. For Allowed Namespaces, add your namespace and uncheck the Allowed Namespaces include their whole subtrees option.

Advanced Options

The values you set below depend on your Google BigLake Catalog settings. If you left Use vended credentials enabled on the General tab and your Google BigLake catalog is configured with credential vending mode, follow the Vended Credentials Catalog setup below. If you disabled Use vended credentials on the General tab and your Google BigLake catalog is configured with end-user credentials, follow the End User Catalog setup below.

Replace the placeholders inside <...> with your respective values. For example, a warehouse value could be gs://yourstoragelocationhere.

  • warehouse (property)

    • Value: <warehouse>
    • Description: Google BigLake Catalog location
  • rest.auth.type (property)

    • Value: org.apache.iceberg.gcp.auth.GoogleAuthManager
    • Description: Required value for a Google BigLake Catalog source
  • header.x-goog-user-project (property)

    • Value: <project>
    • Description: Google project where catalog is located
  • gcp.auth.credentials-json (credential)

    • Value: <your_ADC_JSON_here>
    • Description: Provided file allows Dremio to authenticate with the catalog

Cache Options

  • Enable local caching when possible: Selected by default. Along with asynchronous access for cloud caching, local caching can improve query performance.
  • Max percent of total available cache space to use when possible: Specifies the disk quota, as a percentage, available on any single executor node when local caching is enabled. The default is 100 percent of the total disk space available on the mount point provided for caching. You can either manually enter a percentage in the value field or use the arrows to the far right to adjust the percentage.

Reflection Refresh

You can set the policy that controls how often reflections are scheduled to be refreshed automatically, as well as the time limit after which reflections expire and are removed. See the following options:

OptionDescription
Never refreshSelect to prevent automatic reflection refresh. The default is to automatically refresh.
Refresh everyHow often to refresh reflections, specified in hours, days, or weeks. This option is ignored if Never refresh is selected.
Set refresh scheduleSpecify the daily or weekly schedule.
Never expireSelect to prevent reflections from expiring. The default is to automatically expire after the time limit below.
Expire afterThe time limit after which reflections expire and are removed from Dremio, specified in hours, days, or weeks. This option is ignored if Never expire is selected.

Metadata

Metadata options are configured using the following settings.

Dataset Handling

  • Remove dataset definitions if underlying data is unavailable (default).
  • If this box is not checked and the underlying files under a folder are removed or the folder/source is not accessible, Dremio does not remove the dataset definitions. This option is useful in cases when files are temporarily deleted and put back in place with new sets of files.

Metadata Refresh

These are the optional Metadata Refresh parameters:

  • Dataset Discovery: The refresh interval for fetching top-level source object names such as databases and tables. Set the time interval using this parameter.

    ParameterDescription
    Fetch everyYou can set the frequency to fetch object names in minutes, hours, days, or weeks. The default frequency to fetch object names is 1 hour.
  • Dataset Details: The metadata that Dremio needs for query planning, such as information needed for fields, types, shards, statistics, and locality. These are the parameters to fetch the dataset information.

    ParameterDescription
    Fetch modeYou can fetch only from queried datasets. Dremio updates details for previously queried objects in a source. By default, this is set to Only Queried Datasets.
    Fetch everyYou can set the frequency to fetch dataset details in minutes, hours, days, or weeks. The default frequency to fetch dataset details is 1 hour.
    Expire afterYou can set the expiry time of dataset details in minutes, hours, days, or weeks. The default expiry time of dataset details is 3 hours.

Privileges

This connection inherits privileges from Project settings. To grant specific users or roles additional privileges in this connection:

  1. Enter the username or role name that you want to grant access to and click the Add to Privileges button. The added user or role is displayed in the USERS/ROLES table.
  2. For the users or roles in the USERS/ROLES table, toggle the checkmark for each privilege you want to grant on the Dremio source that is being created.
  3. Click Save after setting the configuration.

See Privileges for additional information about privileges.

Edit a Google BigLake Catalog Connection

  1. On the Open Catalog page, under Connections, right-click the connection and select Settings.
  2. Update the connection configuration as needed.
  3. Click Save.

Delete a Google BigLake Catalog Connection

note

If the source is in a bad state (for example, Dremio cannot authenticate to the source or the source is otherwise unavailable), only users who belong to the ADMIN role can delete the source.

  1. On the Open Catalog page, under Connections, right-click the connection and select Delete.
  2. Click Delete to confirm.