Amazon OpenSearch Service
Amazon OpenSearch Service is a managed service that makes it easy to deploy, operate, and scale OpenSearch clusters in the AWS Cloud.
Compatibility
Dremio supports the following Amazon OpenSearch Service versions:
- 5.x
- 6.0
- 6.2
- 6.3
- 7.0+
Amazon OpenSearch is supported as a data source in Dremio Software on-premises deployments.
As of Dremio Software version 21.3.0+ and 22.0.3+, Amazon OpenSearch is supported as a data source in AWS Edition.
Configuring Amazon OpenSearch Service as a Source
- On the Datasets page, to the right of Sources in the left panel, click .
- In the Add Data Source dialog, under Databases, select Amazon OpenSearch Service.
General
On the General tab, enter a name for the source, connection details, and authentication credentials. The name cannot include the following special characters: /
, :
, [
, or ]
.
Connection
Name | Description |
---|---|
Host | AWS OpenSearch Host name. |
Port | Port on which the AWS OpenSearch service is running (usually 443). |
Authentication
Choose one of the following authentication methods:
- AWS Access Key: Used for key-based authentication.
-
Under AWS Access Key, enter the AWS access key ID.
-
Under AWS Access Secret, provide the AWS access secret using one of the following methods:
-
Dremio: Provide the AWS access secret in plain text. Dremio stores the AWS access secret.
-
Azure Key Vault: Provide the URI for the Azure Key Vault secret that stores the AWS access secret. The URI format is
https://<vault_name>.vault.azure.net/secrets/<secret_name>
(for example,https://myvault.vault.azure.net/secrets/mysecret
).noteTo use Azure Key Vault as your application secret store, you must:
- Deploy Dremio on Azure.
- Complete the Requirements for Authenticating with Azure Key Vault.It is not necessary to restart the Dremio coordinator when you rotate secrets stored in Azure Key Vault. Read Requirements for Secrets Rotation for more information.
-
AWS Secrets Manager: Provide the Amazon Resource Name (ARN) for the AWS Secrets Manager secret that holds the AWS access secret, which is available in the AWS web console or using command line tools.
-
HashiCorp Vault: Choose the HashiCorp secrets engine you're using from the dropdown menu and enter the secret reference for the AWS access secret in the correct format in the provided field.
-
-
Under IAM Role to Assume, enter the IAM role that Dremio should assume in conjunction with the AWS Access Key authentication method.
-
- EC2 Metadata: Dremio uses IAM policy from EC2 instance.
- AWS Profile: Dremio sources profile credentials from the specified AWS profile. For information on how to set up a configuration or credentials file for AWS, see AWS Custom Authentication.
- AWS Profile (Optional): The AWS profile name. If this is left blank, then the default profile will be used. For more information about using profiles in a credentials or configuration file, see AWS's documentation on Configuration and credential file settings.
- No Authentication: No credentials required.
Select the option to perform keyword searches when pushing down fields mapped as text and keyword if desired.
Advanced Options
On the Advanced Options tab, enter the options specific to the OpenSearch Service, encryption, and AWS.
OpenSearch options
- Show hidden indices that start with a dot (.).
- Use Painless scripting with OpenSearch 5.0+ (Checked as a default).
- Show _id columns.
- Use index/doc fields when pushing down aggregates and filters on analyzed and normalized fields (may produce unexpected results).
- Use scripts for query pushdown** (Checked as a default).
- If the number of records returned from OpenSearch is less than the expected number, warn instead of failing the query.
- Read timeout (seconds) (default: 60)
- Scroll timeout (seconds) (default: 300)
- Scroll size -- This setting must be less than or equal to your OpenSearch value for the
index.max_result-window
setting. (default: 4000)
Encryption
Validation modes include:
- Validate certificate and hostname (default)
- Validate certificate only
- Do not validate certificate or hostname
AWS
- Overwrite reqion -- If the box is checked, provide the region.
Reflection Refresh
- Never refresh -- Specifies how often to refresh based on hours, days, weeks, or never.
- Never expire -- Specifies how often to expire based on hours, days, weeks, or never.
Metadata
Dataset Handling
- Remove dataset definitions if underlying data is unavailable (Default).
If this box is not checked and the underlying files under a folder are removed or the folder/source is not accessible, Dremio does not remove the dataset definitions. This option is useful in cases when files are temporarily deleted and put back in place with new sets of files.
Metadata Refresh
- Dataset Discovery -- Refresh interval for top-level source object names such as names of DBs and tables.
- Fetch every -- Specify fetch time based on minutes, hours, days, or weeks. Default: 1 hour
- Dataset Details -- The metadata that Dremio needs for query planning such as information needed for
fields, types, shards, statistics, and locality.
- Fetch mode -- Specify either Only Queried Datasets, All Datasets, or As Needed. Default: Only Queried Datasets
- Only Queried Datasets -- Dremio updates details for previously queried objects in a source.
This mode increases query performance because less work is needed at query time for these datasets. - All Datasets -- Dremio updates details for all datasets in a source. This mode increases query performance because less work is needed at query time.
- As Needed -- Dremio updates details for a dataset at query time. This mode minimized metadata queries on a source when not used, but might lead to longer planning times.
- Only Queried Datasets -- Dremio updates details for previously queried objects in a source.
- Fetch every -- Specify fetch time based on minutes, hours, days, or weeks. Default: 1 hour
- Expire after -- Specify expiration time based on minutes, hours, days, or weeks. Default: 3 hours
- Fetch mode -- Specify either Only Queried Datasets, All Datasets, or As Needed. Default: Only Queried Datasets
Privileges
On the Privileges tab, you can grant privileges to specific users or roles. See Access Controls for additional information about privileges.
All privileges are optional.
- For Privileges, enter the user name or role name that you want to grant access to and click the Add to Privileges button. The added user or role is displayed in the USERS/ROLES table.
- For the users or roles in the USERS/ROLES table, toggle the checkmark for each privilege you want to grant on the Dremio source that is being created.
- Click Save after setting the configuration.
Updating an Amazon OpenSearch Service Source
To update an Amazon OpenSearch Service source:
- On the Datasets page, under Databases in the panel on the left, find the name of the source you want to update.
- Right-click the source name and select Settings from the list of actions. Alternatively, click the source name and then the at the top right corner of the page.
- In the Source Settings dialog, edit the settings you wish to update. Dremio does not support updating the source name. For information about the settings options, see Configuring Amazon OpenSearch Service as a Source.
- Click Save.
Deleting an Amazon OpenSearch Service Source
If the source is in a bad state (for example, Dremio cannot authenticate to the source or the source is otherwise unavailable), only users who belong to the ADMIN role can delete the source.
To delete an Amazon OpenSearch Service source, perform these steps:
- On the Datasets page, click Sources > Databases in the panel on the left.
- In the list of data sources, hover over the name of the source you want to remove and right-click.
- From the list of actions, click Delete.
- In the Delete Source dialog, click Delete to confirm that you want to remove the source.
Deleting a source causes all downstream views that depend on objects in the source to break.