Integrating with AWS Lake Formation Enterprise
Lake Formation provides access controls for datasets in the AWS Glue Data Catalog and is used to define security policies from a centralized location that may be shared across multiple tools. Dremio may be configured to refer to this service to verify access for a user to contained datasets.
Requirements
- Identity Provider service set up
- (Recommended) SAML connection with AWS
- Permissions set up in Lake Formation
- AWS Glue Data Catalog connected to Dremio
Lake Formation Workflow
When Lake Formation is properly configured, Dremio adheres to the following workflow each time an end user attempts to access, edit, or query datasets with managed privileges:
-
Dremio enforces access control. See Configuring Sources for Lake Formation below for access control recommendations.
-
Dremio checks each table to determine if those stored in the AWS Glue source are configured to use Lake Formation for security.
- If one or more datasets leverage Lake Formation, Dremio determines the user ARNs to use when checking against Lake Formation.
-
Dremio queries Lake Formation to determine a user's access level to the datasets using the user/group ARNs.
-
If the user has access to the datasets specified within the query's scope, the query proceeds.
-
If the user lacks access, the query fails with a permission error.
-
Configuring Sources for Lake Formation
Lake Formation integration is dependent on the mapping of user/group names in Dremio to the IAM user/group ARNs used by AWS.
To configure an existing or new AWS Glue source, you must set the following options:
-
From your existing source or upon creating an Amazon Glue Catalog source, navigate to the Advanced Options tab.
-
Enable Enforce AWS Lake Formation access permissions on datasets.
-
Fill in the user and group prefix settings as instructed with the Lake Formation Permissions Reference. For example, if you are using a SAML provider in AWS:
-
User prefix with SAML:
arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:user/
-
Group prefix with SAML:
arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:group/
noteBest Practice: On the Privileges tab, we recommend enabling the Select privilege for All Users to allow non-admin users to access the AWS Glue source from Dremio.
-
-
Click Save.
Lake Formation Cell-Level Security
Dremio supports AWS Lake Formation cell-level security with row-level access permissions based on AWS Lake Formation PartiQL expressions. If the user does not have read permissions on a column or cell, Dremio masks the data in that column or cell with a NULL
value.
To speed up query planning, Dremio uses the AWS Lake Formation permissions cache for each table. By default, the cache is enabled and reuses previously loaded permissions for up to 3600 seconds (1 hour).