Integrating with AWS Lake Formation Enterprise
Lake Formation provides access controls for datasets in the AWS Glue Data Catalog and is used to define security policies from a centralized location that may be shared across multiple tools. Dremio may be configured to refer to this service to verify access for a user to contained datasets.
Requirements
- Dremio v19.0+
- Identity Provider service (e.g., Microsoft Entra ID, LDAP) set up
- (Recommended) SAML connection with AWS
- Permissions set up in Lake Formation
- AWS Glue Data Catalog connected to Dremio
Lake Formation Workflow
When Lake Formation is properly configured, Dremio adheres to the following workflow each time an end user attempts to access, edit, or query datasets with managed privileges:
- Dremio enforces access control. See Configuring Sources for Lake Formation below for access control recommendations.
- Dremio checks each table to determine if those stored in the Glue source are configured to use Lake Formation for security.
- If one or more datasets leverage Lake Formation, Dremio determines the user ARNs to use when checking against Lake Formation.
- Dremio queries Lake Formation to determine a user's access level to the datasets using the user/group ARNs.
- If the user has access to the datasets specified within the query's scope, the query proceeds.
- If the user lacks access, the query fails with a permission error.
Demoing Lake Formation
Demo files and a walkthrough are available to help you test Lake Formation functionality. The demo files and walkthrough are intended for users who have not configured all of the requirements listed above.
Configuring Sources for Lake Formation
Lake Formation integration is dependent on the mapping of user/group names in Dremio to the IAM user/group ARNs used by AWS.
To configure an existing or new AWS Glue source, you must set the following options:
- From your existing source or upon creating an Amazon Glue Catalog source, navigate to the Advanced Options tab.
- Enable Enforce AWS Lake Formation access permissions on datasets.
- Fill in the user and group prefix settings as instructed with the Lake Formation Permissions Reference. For example, if you are using a SAML provider in AWS:
- User prefix with SAML:
arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:user/
- Group prefix with SAML:
arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:group/
noteBest Practice: On the Privileges tab, we recommend enabling the Select privilege for All Users to allow non-admin users to access the AWS Glue source from Dremio.
- User prefix with SAML:
- Click Save.
Lake Formation Cell-Level Security
Dremio supports AWS Lake Formation cell-level security with row-level access permissions based on AWS Lake Formation PartiQL expressions. If the user does not have read permissions on a column or cell, Dremio masks the data in that column or cell with a NULL
value.
To speed up query planning, Dremio uses the AWS Lake Formation permissions cache for each table. By default, the cache is enabled and reuses previously loaded permissions for up to 3600 seconds (1 hour).
Use support keys to disable the cache or customize the cache time-to-live (TTL):
dremio.glue.lakeformation.cache.enable
: To disable permissions caching, set toFALSE
.dremio.glue.lakeformation.cache.ttl
: To specify a TTL for the cache instead of the default 3600 seconds, set to the desired value in seconds.
After you change the value for either support key, you must restart the coordinator node in your Dremio cluster for the change to take effect.