Lake Formation Configuration
This page is intended for customers who wish to configure Lake Formation functionality but may not have all requirements preconfigured.
The following steps outline the process of configuring Dremio to work with an identity provider (IdP), setting up permissions in Lake Formation, and connecting to a Glue source.
- Bootstrapping a Basic IdP
- Configuring Dremio's LDAP Connection
- Connecting the IdP to AWS with SAML
- Setting Permissions in Lake Formation
- Connecting Dremio to the Source
- Testing in Dremio
- Wrapping Up
This guide uses values, such as <password>
, which you must make unique to your organization. Please choose secure passwords where possible.
1.0 Bootstrapping a Basic IdP
This section is intended for users not already using an IdP service (e.g., Azure AD) that supports LDAP and SAML. If you already have a service in place, please proceed to Configuring Dremio's LDAP Connection.
1.1 Creating a Virtual Machine (VM) for the IdP
Both Dremio and AWS need to communicate with the IdP server, so we recommend performing these steps on an externally accessible machine. A small cloud VM is a good option, such as an e2-medium
(2 vCPU, 4 GB RAM) size Compute Engine instance from GCP.
Clone this repository onto the VM.
1.2 Starting Docker Images
Now start Docker using the provided docker-compose.yaml
file:
docker compose up -d
1.3 Bootstrapping OpenLDAP
Perform the following commands:
docker exec -it dremio-lake-formation-demo_openldap_1 bash
# Inside the container:
cd /bootstrap
./bootstrap.sh
exit
1.4: Synchronizing Keycloak Users with OpenLDAP
- Open your browser and enter the following URL for Keycloak:
http://<KEYCLOAK_IP_OR_HOSTNAME>:8080/auth
. - Click Administration Console.
- Log in using the Username
admin
and for Password<password>-keycloak
. - Click Master from the Realm drop-down.
- Click Add realm.
- Name the realm
dremio
. - Click Create.
- Click User Federation.
- From the drop-down, select ldap.
- Apply the following settings:
- Vendor:
Other
- Username LDAP Attribute:
cn
- Username LDAP attribute:
cn
- RDN LDAP attribute:
cn
- UUID LDAP attribute:
uid
- User Object Classes:
inetOrgPerson, organizationalPerson
- Connection URL:
ldap://openldap:1389
- Users DN:
ou=users,dc=example,dc=org
- Bind DN:
cn=admin,dc=example,dc=org
- Bind Credential:
<password>-ldap
- Vendor:
- Click Test connection and Test authentication to validate the settings.
- Click Save.
- Click Synchronize all users.
- Set periodic Sync Settings, if desired.
1.5 Synchronizing Keycloak Groups with OpenLDAP
- From the LDAP User Federation page, click the Mappers tab.
- Click Create.
- Apply the following settings:
- Name:
group-ldap-mapper
- Mapper Type:
group-ldap-mapper
- LDAP Groups DN:
ou=groups,dc=example,dc=org
- Group Name LDAP Attribute:
cn
- Group Object Classes:
groupOfNames
- Name:
- Click Save.
- Click Sync LDAP Groups to Keycloak.
2.0 Configuring Dremio's LDAP Connection
2.1 Stopping Dremio
Use the following command to stop the Dremio service:
bin/dremio stop
2.2 Editing dremio.conf
Add the following settings to dremio.conf
:
services.coordinator.web.auth.type: "ldap"
services.coordinator.web.auth.config: "ad.json"
The services.coordinator.web.auth.config
configuration property replaces services.coordinator.web.auth.ldap_config
, which is deprecated.
2.3 Creating the ad.json
File
Create a file named ad.json
and put it in the same directory as dremio.conf
. Copy and paste the following to the file:
In Dremio 24+, bindPassword
can be encrypted using the dremio-admin encrypt
CLI command.
{
"connectionMode": "PLAIN",
"servers": [
{
"hostname": "<LDAP_IP_OR_HOSTNAME>",
"port": 1389
}
],
"names": {
"baseDN": "dc=example,dc=org",
"bindDN": "cn=admin,dc=example,dc=org",
"bindPassword": "changeme-ldap",
"userFilter": "(&(objectClass=inetOrgPerson))",
"userAttributes": {
"baseDNs": [
"ou=users,dc=example,dc=org"
],
"searchScope": "SUB_TREE",
"firstname": "cn",
"id": "cn",
"lastname": "sn",
"email": "cn"
},
"userGroupRelationship": "GROUP_ENTRY_LISTS_USERS",
"groupEntryListsUsers": {
"userEntryUserIdAttribute": "dn",
"groupEntryUserIdAttribute": "member"
},
"groupDNs": [
"CN={0},ou=groups,dc=example,dc=org"
],
"groupFilter": "(objectClass=groupOfNames)",
"autoAdminFirstUser": true
}
}
2.4 Verifying Dremio Logins
Start the Dremio service:
bin/dremio start
Open your browser and navigate to
http://<DREMIO_IP_OR_HOSTNAME>:8080
.Log in as the admin with the Username
admin
and Password<password>-ldap
.Login as one or more users (
user00
throughuser99
) with the Usernameuser00
and Password<password>
.
The admin user will have universal privileges in Dremio, whereas user accounts will have basic access only.
3.0 Connecting the IdP to AWS with SAML
- Download the
descriptor.xml
metadata file fromhttp://<HOSTNAME_OF_KEYCLOAK>:8080/auth/realms/dremio/protocol/saml/descriptor
(or from your existing IdP). - Log in to the AWS Console.
- Open IAM Service.
- Click Identity Providers.
- Click Add provider.
- Use the default SAML type.
- Give the provider a name (remember this value, as it is used later in place of
<PROVIDER_NAME_IN_AWS>
). - Upload the
descriptor.xml
file. - Click Add provider.
4.0 Setting Permissions in Lake Formation
4.1 Creating a Table
If you don't already have tables set up in AWS Glue or Lake Formation, you may create one or more now:
- While at the AWS Console, open Lake Formation Service.
- Click Tables.
- Click Create table.
- Fill in settings as desired. If needed, also create a database and S3 bucket.
4.2 Adding Permissions
- From the AWS Console, open Lake Formation Service.
- Click Data permissions.
- Click Grant.
- Apply the following settings:
- Principals:
SAML users and groups
- SAML and Amazon QuickSight users and groups:
arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:user/user00
ORarn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:group/group0
(Note:userX0
throughuserX9
are members ofgroupX
forX = [0,9]
) - LF-Tags or catalog resources:
Named data catalog resources
- Databases:
<DATABASE_NAME>
- Tables:
<TABLE_NAME>
- Table and column permissions:
Select
and/orSuper
(all)
- Principals:
5.0 Connecting Dremio to the Source
- Open your browser and navigate to Dremio:
http://<DREMIO_IP_OR_HOSTNAME>:8080
. - Click the + button next to Data Lakes.
- Click Amazon Glue Catalog.
- Fill out the General tab, including Name and Authentication
- Select the Advanced Options tab and complete the following:
- Enable Enforce AWS Lake Formation access permissions on datasets.
- Fill in the user and group prefix settings per Lake Formation Permissions Reference. For this demo, use SAML:
- User prefix:
arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:user/
- Group prefix:
arn:aws:iam::<AWS_ACCOUNT_ID>:saml-provider/<PROVIDER_NAME_IN_AWS>:group/
- User prefix:
- Under the Privileges tab, you may optionally enable Select privileges for All Users. This will allow other users--not just the admin account--to access this source.
- Click Save.
6.0 Testing in Dremio
Open your browser and navigate to Dremio:
http://<DREMIO_IP_OR_HOSTNAME>:8080
.Log in as admin or one of the user accounts (
user00
throughuser99
)noteuserX0
throughuserX9
are members ofgroupX
forX = [0,9]
Click on the AWS Glue source that you added previously.
Explore the table(s) available, making sure to note that Lake Formation permissions are enforced when tables are accessed or queried.
Log in as other users as well to test permissions.
7.0 Wrapping Up
Once you've completed your tests of Dremio's functionality with Lake Formation, don't forget to shut down or delete the virtual machine running your IdP service to avoid additional uptime charges.