Skip to main content
Version: 24.3.x

Integrating with AWS Lake Formation Enterprise

Lake Formation provides access controls for datasets in Glue and is used to define security policies from a centralized location that may be shared across multiple tools. Dremio may be configured to refer to this service to verify access for a user to contained datasets.

Requirements

Lake Formation Workflow

When Lake Formation is properly configured, Dremio adheres to the following workflow each time an end user attempts to access, edit, or query datasets with managed privileges:

  1. In all cases, Dremio access controls is enforced. See Configuring Sources for Lake Formation below for access control recommendations.
  2. Dremio checks each table to determine if those stored in the Glue source are configured to use Lake Formation for security.
  3. If one or more datasets leverage Lake Formation, Dremio determines the user ARNs to use when checking against Lake Formation.
  4. Dremio queries Lake Formation to determine a user's access level to the datasets using the user/group ARNs.
  5. If the user has access to the datasets specified within the query's scope, the query proceeds. If the user lacks access, the query will fail with a permission error.

Demoing Lake Formation

Both demo files and a walkthrough are available to help you test Lake Formation functionality. This demo is intended for customers that do not have all of the requirements listed above preconfigured.

Configuring Sources for Lake Formation

Lake Formation integration is dependent on the mapping of user/group names in Dremio to the IAM user/group ARNs used by AWS.

To configure an existing or new Glue source, you must set the following options:

  1. From your existing source or upon creating an Amazon Glue Catalog source, navigate to the Advanced Options tab.

  2. Enable Enforce AWS Lake Formation access permissions on datasets.

  3. Fill in the user and group prefix settings as instructed with the Lake Formation Permissions Reference. For example, if you are using a SAML provider in AWS:

    • User prefix with SAML: arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:user/
    • Group prefix with SAML: arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:group/
    note

    Best Practice: From the Privileges tab, we recommend enabling the Select privilege for All Users, as this will allow non-admin users to access this source from Dremio.

  4. Click Save.

Lake Formation Cell-Level Security

Dremio supports AWS Lake Formation cell-level security with row-level access permissions based on AWS Lake Formation PartiQL expressions. If the user does not have read permissions on a column or cell, Dremio masks the data in that column or cell with a NULL value.

To speed up query planning, Dremio uses the AWS Lake Formation permissions cache for each table. By default, the cache is enabled and reuses previously loaded permissions for up to 3600 seconds (1 hour).

Use support keys to disable the cache or customize the cache time-to-live (TTL):

  • dremio.glue.lakeformation.cache.enable: To disable permissions caching, set to FALSE.
  • dremio.glue.lakeformation.cache.ttl: To specify a TTL for the cache instead of the default 3600 seconds, set to the desired value in seconds.

After you change the value for either support key, you must restart the coordinator node in your Dremio cluster for the change to take effect.