Google BigQuery
Dremio supports connecting to Google BigQuery as an external source. The connector uses Google Service Account Keys as the authentication method. To know more about creating service account keys, see Create and delete service account keys.
Configuring Google BigQuery as a Source
- In the bottom-left corner of the Datasets page, click Add Source.
- Under Databases in the Add Data Source dialog, select Google BigQuery.
General
-
In the Name field, specify the name by which you want the Google BigQuery source to appear in the list of data sources. The name cannot include the following special characters:
/,:,[, or]. -
Under Connection, follow these steps:
- In the Host field, specify the URL for the Google BigQuery source.
- In the Port field, specify the port to use. The default port is
443. - In the Project Id field, specifiy the Google Cloud Project ID.
-
Under Authentication, specify the service account Client Email and the Service Account Key (JSON).
noteThis connector assumes that the Service Account Key is a JSON Web Key. For more information on Google Cloud service account credentials, please see Service account credentials.
Advanced Options
On the Advanced Options page, you can set values for these non-required options:
| Option | Description |
|---|---|
| Record fetch size | Number of records to fetch at once. Set to 0 (zero) to have Dremio automatically decide. By default, this is set to 200. |
| Maximum Idle Connections | The total number of connections allowed to be idle at a given time. The default number of maximum idle connections is 8. |
| Connection Idle Time | The amount of time (in seconds) allowed for a connection to remain idle before the connection is terminated. The default connection idle time is 6 seconds. |
| Query Timeout | The amount of time (in seconds) allowed to wait for the results of a query. If this time expires, the connection being used is returned to an idle state. Set the Query timneout to 0 for no timeout. The default Query timeout is 0. |
Reflection Refresh
On the Reflection Refresh page, set the policy that controls how often Reflections are scheduled to be refreshed automatically, as well as the time limit after which Reflections expire and are removed.
| Option | Description |
|---|---|
| Never refresh | Select to prevent automatic Reflection refresh, otherwise, the default is to refresh automatically. |
| Refresh every | How often to refresh Reflections, specified in hours, days or weeks. This option is ignored if Never refresh is selected. |
| Set refresh schedule | Specify the daily or weekly schedule. |
| Never expire | Select to prevent Reflections from expiring, otherwise, the default is to expire automatically after the time limit specified in Expire after. |
| Expire after | The time limit after which Reflections expire and are removed from Dremio, specified in hours, days or weeks. This option is ignored if Never expire is selected. |
Metadata
On the Metadata page, you can configure settings to refresh metadata and handle datasets.
Dataset Handling
These are the optional Dataset Handling parameters.
| Parameter | Description |
|---|---|
| Remove dataset definitions if underlying data is unavailable | By default, Dremio removes dataset definitions if underlying data is unavailable. Useful when files are temporarily deleted and added back in the same location with new sets of files. |
Metadata Refresh
These are the optional Metadata Refresh parameters:
-
Dataset Discovery: The refresh interval for fetching top-level source object names such as databases and tables. Set the time interval using this parameter.
Parameter Description (Optional) Fetch every You can choose to set the frequency to fetch object names in minutes, hours, days, or weeks. The default frequency to fetch object names is 1 hour. -
Dataset Details: The metadata that Dremio needs for query planning such as information required for fields, types, shards, statistics, and locality. These are the parameters to fetch the dataset information.
Parameter Description Fetch mode You can choose to fetch only from queried datasets that are set by default. Dremio updates details for previously queried objects in a source. Fetch every You can choose to set the frequency to fetch dataset details in minutes, hours, days, or weeks. The default frequency to fetch dataset details is 1hour.Expire after You can choose to set the expiry time of dataset details in minutes, hours, days, or weeks. The default expiry time of dataset details is 3hours.