Integrating with AWS Lake Formation Enterprise
Lake Formation provides access controls for datasets in Glue and is used to define security policies from a centralized location that may be shared across multiple tools. Dremio may be configured to refer to this service to verify access for a user to contained datasets.
Requirements
- Dremio v19.0+
- Identity Provider service (e.g., Azure AD, LDAP) set up
- Recommended SAML connection with AWS
- Permissions set up in Lake Formation
- Glue source connected to Dremio
Lake Formation Workflow
When Lake Formation is properly configured, Dremio adheres to the following workflow each time an end user attempts to access, edit, or query datasets with managed privileges:
- In all cases, Dremio access controls is enforced. See Configuring Sources for Lake Formation below for access control recommendations.
- Dremio checks each table to determine if those stored in the Glue source are configured to use Lake Formation for security.
- If one or more datasets leverage Lake Formation, Dremio determines the user ARNs to use when checking against Lake Formation.
- Dremio queries Lake Formation to determine a user's access level to the datasets using the user/group ARNs.
- If the user has access to the datasets specified within the query's scope, the query proceeds. If the user lacks access, the query will fail with a permission error.
Demoing Lake Formation
Both demo files and a walkthrough are available to help you test Lake Formation functionality. This demo is intended for customers that do not have all of the requirements listed above preconfigured.
Configuring Sources for Lake Formation
Lake Formation integration is dependent on the mapping of user/group names in Dremio to the IAM user/group ARNs used by AWS.
To configure an existing or new Glue source, you must set the following options:
-
From your existing source or upon creating an Amazon Glue Catalog source, navigate to the Advanced Options tab.
-
Enable Enforce AWS Lake Formation access permissions on datasets.
-
Fill in the user and group prefix settings as instructed with the Lake Formation Permissions Reference. For example, if you are using a SAML provider in AWS:
- User prefix with SAML:
arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:user/
- Group prefix with SAML:
arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:group/
noteBest Practice: From the Privileges tab, we recommend enabling the Select privilege for All Users, as this will allow non-admin users to access this source from Dremio.
- User prefix with SAML:
-
Click Save.
Lake Formation Cell-Level Security
Dremio supports AWS Lake Formation cell-level security with row-level access permissions based on AWS Lake Formation PartiQL expressions. If the user does not have read permissions on a column or cell, Dremio masks the data in that column or cell with a NULL
value.
To speed up query planning, Dremio uses the AWS Lake Formation permissions cache for each table. By default, the cache is enabled and reuses previously loaded permissions for up to 3600 seconds (1 hour).
Use support keys to disable the cache or customize the cache time-to-live (TTL):
dremio.glue.lakeformation.cache.enable
: To disable permissions caching, set toFALSE
.dremio.glue.lakeformation.cache.ttl
: To specify a TTL for the cache instead of the default 3600 seconds, set to the desired value in seconds.
After you change the value for either support key, you must restart the coordinator node in your Dremio cluster for the change to take effect.