On this page

    Google Cloud Storage (GCS)

    Dremio allows for integration with environments using the Google Cloud Storage (GCS) web service for storing data. Configuration of this source allows for direct access to GCS data through the Dremio interface.

    Software Requirements

    General Tab

    The following options are available from the General tab.

    FieldDescription
    NameA name to identify the data source in Dremio.
    Google Project IDThe specific ID for your GCS project. This can be found in the **Project info** pane at the top-left of your screen when at the Home page.
    Service Account KeysThe most common method of integrating Dremio with GCS is through the creation of a service account key. When this option is selected, you need to provide values for the Client Email, Client ID, Private Key ID, and Private Key fields.

    To obtain a service account key to fill in these fields, follow these steps.
    Automatic/Service AccountSelect this option if you are currently running Dremio on a Google Compute instance. Dremio will then use the active service account for your account and does not require any additional information to integrate with your data.
    Client EmailThe email address associated with the GCS service account. This is only required when the Service Account Keys radio button is selected.
    Client IDThe client ID for your key pair. The value is found by following the steps below to create a service account key. This is only required when the Service Account Keys radio button is selected.
    Private Key IDThe key ID for your key pair. The value is found by following the steps below to create a service account key. This is only required when the Service Account Keys radio button is selected.
    Private KeyThe private key for your key pair. The value is found by following the steps below to create a service account key. This is only required when the Service Account Keys radio button is selected.

    Creating Service Account Keys

    In order to use Dremio to access your Google Cloud Storage source, you need to first identify the service account. This is done by creating public/private key pairs. When creating service account keys, the public portion is stored on Google Cloud, while the private portion is made available to you for entry on Dremio.

    The steps below outline the most simple method of creating a service account key.

    1. From the Google Cloud Console, navigate to the Service Accounts page.
    2. Select the desired project.
    3. Click on the email address of the service account that you’ll be creating a key for.
    4. Click on the Keys tab.
    5. Click the Add Key drop-down menu and then select Create new key.
    6. Select JSON as the Key Type and then click Create.

    Your browser then downloads a service account key file. It should look similar to the example below:

    {
      "type": "service_account",
      "project_id": "project-id",
      "private_key_id": "key-id",
      "private_key": "-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n",
      "client_email": "service-account-email",
      "client_id": "client-id",
      "auth_uri": "https://accounts.google.com/o/oauth2/auth",
      "token_uri": "https://accounts.google.com/o/oauth2/token",
      "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
      "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account-email"
    }
    

    Copy and paste each value from this file to the corresponding fields on the Dremio interface.

    For additional methods of creating a key (e.g., gcloud tool, REST APIs, etc.), view Google’s documentation.

    Advanced Options Tab

    The following settings control more advanced functionalities in Dremio.

    FieldDescription
    Enable asynchronous access when possibleAllows for multiple queries simultaneously rather than waiting for a single query to complete before new tasks are performed.
    Root PathThe root path for the GCS source.
    PropertiesAdditional connection properties, consisting of the property and its specified value.
    Whitelisted bucketsA list of buckets to whitelist, or allow access to.
    Cache Options
    Enable local caching when possibleDremio creates caches locally of any data used from the source.
    Max percent of total available cache space to use when possibleSets the allowable amount of local caching, based on percentage. Only the percent specified of the cached files will be stored locally. By default, this is set to 100.

    Reflection Refresh Tab

    This tab controls the frequency of reflection refreshes or the timespan for expiration for any queries performed using this data source.

    FieldDescription
    Never refreshPrevents any query reflections associated with this source from refreshing.
    Refresh everySets the time interval by which reflections for this source are refreshed. This may be set to hours, days, and weeks.
    Never expirePrevents any query reflections associated with this source from expiring.
    Expire afterSets the time after a reflection is created that it then expires and can no longer be used for queries. This may be set to hours, days, and weeks.

    Metadata Tab

    This tab offers settings that control how dataset details are fetched and refreshed.

    FieldDescription
    Dataset Handling
    Remove dataset definitions if underlying data is unavailableIf this box is not checked and the underlying files under a folder are removed or the folder/source is not accessible, Dremio does not remove the dataset definitions. This option is useful in cases when files are temporarily deleted and put back in place with new sets of files.
    Automatically format files into physical datasets when users issue queriesIf this box is checked and a query runs against the un-promoted PDS/folder, Dremio automatically promotes using default options. If you have CSV files, especially with non-default options, it might be useful to not check this box.
    Metadata Refresh
    Dataset DiscoverySpecifies the refresh interval for top-level source object names, such as database and table names. This is a lightweight operation.
    • Fetch every. Specifies the time interval by which Dremio fetches object names. This can be set by minutes, hours, days, and weeks.
    Dataset DetailsSpecifies the metadata that Dremio needs for query planing, such as information regarding fields, types, shards, statistics, and locality.

    • Fetch mode. Restricts when metadata is retrieved.
      • Only Queried Datasets. Dremio updates metadata details for previously-queried objects in a source. This mode increases query performance as it requires less work to be done at query time for these datasets.
      • All Datasets (deprecated). Dremio updates the details for all datasets in a source. This mode increases query performance as less work is needed to be done at the time of query.
    • Fetch every. Specifies the time interval by which metadata is fetched. This can be set by minutes, hours, days, and weeks.
    • Expire after. Specifies the timespan for when dataset details expire after a dataset is queried. This can be set by minutes, hours, days, and weeks.

    Privileges Tab

    From this tab, administrators may control access to the data source on a user-by-user or group-by-group basis.

    For additional information view theUsers, Groups, and Roles page. If you’re using Dremio v16.0+, then please view the new Access Control functionality.