Skip to main content
Version: 24.3.x

Lake Formation Configuration

This page is intended for customers who wish to configure Lake Formation functionality but may not have all requirements preconfigured.

The following steps outline the process of configuring Dremio to work with an identity provider (IdP), setting up permissions in Lake Formation, and connecting to a Glue source.

  1. Bootstrapping a Basic IdP
    1. Creating a Virtual Machine (VM) for the IdP
    2. Starting Docker Images
    3. Bootstrapping OpenLDAP
    4. Synchronizing Keycloak Users with OpenLDAP
    5. Synchronizing Keycloak Groups with OpenLDAP
  2. Configuring Dremio's LDAP Connection
    1. Stopping Dremio
    2. Editing dremio.conf
    3. Creating the ad.json File
    4. Verifying Dremio Logins
  3. Connecting the IdP to AWS with SAML
  4. Setting Permissions in Lake Formation
    1. Creating a Table
    2. Adding Permissions
  5. Connecting Dremio to the Source
  6. Testing in Dremio
  7. Wrapping Up
note

This guide uses values, such as <password>, which you must make unique to your organization. Please choose secure passwords where possible.

1.0 Bootstrapping a Basic IdP

note

This section is intended for users not already using an IdP service (e.g., Azure AD) that supports LDAP and SAML. If you already have a service in place, please proceed to Configuring Dremio's LDAP Connection.

1.1 Creating a Virtual Machine (VM) for the IdP

Both Dremio and AWS need to communicate with the IdP server, so we recommend performing these steps on an externally accessible machine. A small cloud VM is a good option, such as an e2-medium (2 vCPU, 4 GB RAM) size Compute Engine instance from GCP.

Clone this repository onto the VM.

1.2 Starting Docker Images

Now start Docker using the provided docker-compose.yaml file:

docker compose up -d

1.3 Bootstrapping OpenLDAP

Perform the following commands:

docker exec -it dremio-lake-formation-demo_openldap_1 bash
# Inside the container:
cd /bootstrap
./bootstrap.sh
exit

1.4: Synchronizing Keycloak Users with OpenLDAP

  1. Open your browser and enter the following URL for Keycloak: http://<KEYCLOAK_IP_OR_HOSTNAME>:8080/auth.
  2. Click Administration Console.
  3. Log in using the Username admin and for Password <password>-keycloak.
  4. Click Master from the Realm drop-down.
  5. Click Add realm.
  6. Name the realm dremio.
  7. Click Create.
  8. Click User Federation.
  9. From the drop-down, select ldap.
  10. Apply the following settings:
    • Vendor: Other
    • Username LDAP Attribute: cn
    • Username LDAP attribute: cn
    • RDN LDAP attribute: cn
    • UUID LDAP attribute: uid
    • User Object Classes: inetOrgPerson, organizationalPerson
    • Connection URL: ldap://openldap:1389
    • Users DN: ou=users,dc=example,dc=org
    • Bind DN: cn=admin,dc=example,dc=org
    • Bind Credential: <password>-ldap
  11. Click Test connection and Test authentication to validate the settings.
  12. Click Save.
  13. Click Synchronize all users.
  14. Set periodic Sync Settings, if desired.

1.5 Synchronizing Keycloak Groups with OpenLDAP

  1. From the LDAP User Federation page, click the Mappers tab.
  2. Click Create.
  3. Apply the following settings:
    • Name: group-ldap-mapper
    • Mapper Type: group-ldap-mapper
    • LDAP Groups DN: ou=groups,dc=example,dc=org
    • Group Name LDAP Attribute: cn
    • Group Object Classes: groupOfNames
  4. Click Save.
  5. Click Sync LDAP Groups to Keycloak.

2.0 Configuring Dremio's LDAP Connection

2.1 Stopping Dremio

Use the following command to stop the Dremio service:

bin/dremio stop

2.2 Editing dremio.conf

Add the following settings to dremio.conf:

services.coordinator.web.auth.type: "ldap"
services.coordinator.web.auth.config: "ad.json"
note

The services.coordinator.web.auth.config configuration property replaces services.coordinator.web.auth.ldap_config, which is deprecated.

2.3 Creating the ad.json File

Create a file named ad.json and put it in the same directory as dremio.conf. Copy and paste the following to the file:

note

In Dremio 24+, bindPassword can be encrypted using the dremio-admin encrypt CLI command.

{
"connectionMode": "PLAIN",
"servers": [
{
"hostname": "<LDAP_IP_OR_HOSTNAME>",
"port": 1389
}
],
"names": {
"baseDN": "dc=example,dc=org",
"bindDN": "cn=admin,dc=example,dc=org",
"bindPassword": "changeme-ldap",
"userFilter": "(&(objectClass=inetOrgPerson))",
"userAttributes": {
"baseDNs": [
"ou=users,dc=example,dc=org"
],
"searchScope": "SUB_TREE",
"firstname": "cn",
"id": "cn",
"lastname": "sn",
"email": "cn"
},
"userGroupRelationship": "GROUP_ENTRY_LISTS_USERS",
"groupEntryListsUsers": {
"userEntryUserIdAttribute": "dn",
"groupEntryUserIdAttribute": "member"
},
"groupDNs": [
"CN={0},ou=groups,dc=example,dc=org"
],
"groupFilter": "(objectClass=groupOfNames)",
"autoAdminFirstUser": true
}
}

2.4 Verifying Dremio Logins

  1. Start the Dremio service:

    bin/dremio start
  2. Open your browser and navigate to http://<DREMIO_IP_OR_HOSTNAME>:8080.

  3. Log in as the admin with the Username admin and Password <password>-ldap.

  4. Login as one or more users (user00 through user99) with the Username user00 and Password <password>.

The admin user will have universal privileges in Dremio, whereas user accounts will have basic access only.

3.0 Connecting the IdP to AWS with SAML

  1. Download the descriptor.xml metadata file from http://<HOSTNAME_OF_KEYCLOAK>:8080/auth/realms/dremio/protocol/saml/descriptor (or from your existing IdP).
  2. Log in to the AWS Console.
  3. Open IAM Service.
  4. Click Identity Providers.
  5. Click Add provider.
  6. Use the default SAML type.
  7. Give the provider a name (remember this value, as it is used later in place of <PROVIDER_NAME_IN_AWS>).
  8. Upload the descriptor.xml file.
  9. Click Add provider.

4.0 Setting Permissions in Lake Formation

4.1 Creating a Table

If you don't already have tables set up in AWS Glue or Lake Formation, you may create one or more now:

  1. While at the AWS Console, open Lake Formation Service.
  2. Click Tables.
  3. Click Create table.
  4. Fill in settings as desired. If needed, also create a database and S3 bucket.

4.2 Adding Permissions

  1. From the AWS Console, open Lake Formation Service.
  2. Click Data permissions.
  3. Click Grant.
  4. Apply the following settings:
    • Principals: SAML users and groups
    • SAML and Amazon QuickSight users and groups: arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:user/user00 OR arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:group/group0 (Note: userX0 through userX9 are members of groupX for X = [0,9])
    • LF-Tags or catalog resources: Named data catalog resources
    • Databases: <DATABASE_NAME>
    • Tables: <TABLE_NAME>
    • Table and column permissions: Select and/or Super (all)

5.0 Connecting Dremio to the Source

  1. Open your browser and navigate to Dremio: http://<DREMIO_IP_OR_HOSTNAME>:8080.
  2. Click the + button next to Data Lakes.
  3. Click Amazon Glue Catalog.
  4. Fill out the General tab, including Name and Authentication
  5. Select the Advanced Options tab and complete the following:
    1. Enable Enforce AWS Lake Formation access permissions on datasets.
    2. Fill in the user and group prefix settings per Lake Formation Permissions Reference. For this demo, use SAML:
      • User prefix: arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:user/
      • Group prefix: arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:group/
  6. Under the Privileges tab, you may optionally enable Select privileges for All Users. This will allow other users--not just the admin account--to access this source.
  7. Click Save.

6.0 Testing in Dremio

  1. Open your browser and navigate to Dremio: http://<DREMIO_IP_OR_HOSTNAME>:8080.

  2. Log in as admin or one of the user accounts (user00 through user99)

    note

    userX0 through userX9 are members of groupX for X = [0,9]

  3. Click on the AWS Glue source that you added previously.

  4. Explore the table(s) available, making sure to note that Lake Formation permissions are enforced when tables are accessed or queried.

  5. Log in as other users as well to test permissions.

7.0 Wrapping Up

Once you've completed your tests of Dremio's functionality with Lake Formation, don't forget to shut down or delete the virtual machine running your IdP service to avoid additional uptime charges.