Skip to main content
Version: current [25.0.x]

Dremio-Native: Row-Access & Column-Masking Policies Enterprise

Row-access and column-masking policies may be applied to tables, views, and columns via user-defined functions (UDFs).

This allows you to filter sensitive data based upon the rules and conditions you need to maintain compliance or adhere to regulatory requirements, while also removing the need to produce a secondary set of data with protected information manually removed.

note

Dremio only supports one data governance policy manager at a time, so you can use either Dremio or Ranger as a policy manager but not both at the same time.

When adding a new Hive source, you have the following options for Hive authorization clients:

  • Storage-Based with User Impersonation
  • SQL-Based
  • Ranger-Based

Requirements

The following source types are supported:

  • Object Storage - S3, Azure Storage, GCS, HDFS
  • Metastores - AWS Glue, Hive Metastore
  • Databases - Oracle, SQL Server, etc.
note

Row-access and column-masking policies are not supported for Nessie Catalogs.

The following restrictions apply:

  • Only users with the ADMIN role may create UDFs
  • A function may only have one owner (the user that created the UDF, by default), which may be transferred using the GRANT OWNERSHIP command
  • The owner of a UDF that serves as a policy must have the EXECUTE privilege for that UDF.
  • Use of the is_member function shown in the examples below is not currently available to organizations using SSO role authentication.

Column-Masking

Column-masking is a way to maskor scrambleprivate data at the column-level dynamically prior to query execution. For example, the owner of a table or view may apply a policy to a column to only display the year of a date or the last four digits of a credit card.

Column-masking policies may be any UDF with a scalar return type that is identical to the data type of the column on which it is applied. However, only one column-masking policy may be applied to each column.

In the following example of a user-defined function, only users within in the Accounting department in the state of California (CA) may see an entry's social security number (ssn) if the record lists an income above $10,000, otherwise the SSN value is masked with XXX-XX-.

Column-masking
CREATE FUNCTION protect_ssn (ssn VARCHAR(11))
RETURNS VARCHAR(11)
RETURN SELECT CASE WHEN query_user()='jdoe@dremio.com' OR is_member('Accounting') THEN ssn
ELSE CONCAT('XXX-XXX-', SUBSTR(ssn,9,3))
END;

Row-Access

Row-access policies are a way to control which records in a table or view are returned for specific users and roles. For example, the owner of a table or view may apply a policy that filters out customers from a specific country unless the user running the query has a specific role.

Row-access
CREATE FUNCTION country_filter (country VARCHAR)
RETURNS BOOLEAN
RETURN SELECT query_user()='jdoe@dremio.com' OR (is_member('Accounting') AND country='CA');

Row-access policies may be any boolean UDF applied to the table or view. The return value of the UDF is treated logically in a query as an AND operator included in a WHERE clause. The return type of the UDF must be BOOLEAN, otherwise Dremio will give an error at execution time.

User-Defined Functions

A user-defined function, or UDF, is a callable routine that accepts input parameters, executes the function body, and returns a single value or a set of rows.

The UDFs which serve as the basis for filtering and masking policies must be defined independently of your sources. Not only does this allow organizations to use a single policy for multiple tables and views, but this also restricts user access to policies and prevents unauthorized tampering. Modifying a single UDF automatically updates the policy in the context of any tables or views using that access or mask policy.

The following process describes how policies are enforced with Dremio:

  1. A user with the ADMIN role creates a UDF to serve as a security policy.
  2. The administrator then sets the security policy to one or more tables, views, and/or columns.
  3. Dremio enforces the policy at runtime when an end-user performs a query.

Creating UDFs and attaching security policies is done through SQL commands. Policies are applied prior to execution during the query planning phase. At this point, Dremio checks first the table/view for a row-access policy and then each column accessed for a column-masking policy. If any policies are found, they are automatically applied to the policy's scope using the associated UDF in the query plan.

Query Substitutions

Row-access and column-masking function act as an "implicit view," replacing a table/view reference in an SQL statement prior to processing the query. This implicit view is created through an examination of each policy applied to a table, view, or column.

For example, jdoe@dremio.com has SELECT access to table_1. However, the column-masking policy protect_ssn is set for the column_1 column with a UDF to replace all but the last four digits of a social security number with X for anyone that is not a member of the Accounting department, or this user. When they run a query in Dremio that includes this column-masking policy, the following occurs:

  1. During the SQL Planning phase, Dremio identifies which tables, views, and columns are being accessed (table_1) and whether security policies must be enforced.
  2. The engine searches for any security policies set to the associated objects, such as protect_ssn (see Examples of UDFs below).
  3. When the protect_ssn policy is found for the object affected by the query, the query planner immediately modifies the execution path to incorporate the masking function.
  4. Query execution proceeds as normal with the associated UDF included within the execution path.

Listing Existing UDFs

To view all existing UDFs created in Dremio, use the SHOW FUNCTIONS SQL command.

Listing Existing Policies

To view row-access and column-masking policies, use a SELECT statement with the target table/view, system table, and policies specified.

List existing policies
SELECT view_name, masking_policies, row_access_policies FROM sys.views;
SELECT table_name, masking_policies, row_access_policies FROM sys."tables";

To view all column-masking policies set for a given table, use the DESCRIBE TABLE command.

Setting a Policy

To create a row-access or column-masking policy, you must perform the following steps using the associated SQL commands:

  1. Create a new UDF or replace an existing one using the CREATE [OR REPLACE] FUNCTION command.

    Create or replace UDF
    CREATE FUNCTION country_filter (country VARCHAR)
    RETURNS BOOLEAN
    RETURN SELECT query_user()='jdoe@dremio.com' OR (is_member('Accounting') AND country='CA');

    CREATE FUNCTION id_filter (id INT)
    RETURNS BOOLEAN
    RETURN SELECT id = 1;
  2. Create a policy to apply the function using either ADD ROW ACCESS POLICY for row-level access or SET MASKING POLICY for column-masking. These may be used with the CREATE TABLE, CREATE VIEW, ALTER TABLE, and ALTER VIEW commands.

    Create policy to apply function
    -- Add row-access policy
    ALTER TABLE e.employee
    ADD ROW ACCESS POLICY country_filter(country);

    -- Add column-masking policy
    ALTER VIEW e.employee_view
    SET MASKING POLICY protect_ssn (ssn_col, region);

    -- Create table with row policy
    CREATE TABLE e.employee(
    id INTEGER,
    ssn VARCHAR(11),
    country VARCHAR,
    ROW ACCESS POLICY country_filter(country)
    );

    -- Create table with masking policy
    CREATE VIEW e.employee_view(
    ssn_col VARCHAR MASKING POLICY protect_ssn (ssn_col, region),
    region VARCHAR,
    state_col VARCHAR)
    );
note

Both row-access and column-masking UDFs may be applied in a single security policy, or set individually.

Dropping a Policy

To remove a security policy from a table, view, or row, use the UNSET MASKING POLICY or DROP ROW ACCESS POLICY syntax with ALTER TABLE/VIEW.

Drop policy
ALTER TABLE w.employee DROP ROW ACCESS POLICY country_filter(country);
ALTER VIEW e.employees_view MODIFY COLUMN ssn_col UNSET MASKING POLICY protect_ssn;

Examples of UDFs

The following are examples of user-defined functions that you may create with Dremio.

Column-Masking

Redacting SSN

Redact SSN
CREATE FUNCTION
protect_ssn (val VARCHAR)
RETURNS VARCHAR
RETURN
SELECT
CASE
WHEN query_user() IN ('jdoe@dremio.com','janders@dremio.com')
OR is_member('Accounting') THEN val
ELSE CONCAT('XXX-XX-',SUBSTR(val,8,4))
END;

Using Masking & Access Policies

Use masking and access policies
CREATE FUNCTION lower_country(country VARCHAR)
RETURNS VARCHAR
RETURN SELECT lower(country);

CREATE FUNCTION country_filter (country VARCHAR)
RETURNS BOOLEAN
RETURN SELECT query_user()='dremio'
OR (is_member('Accounting')
AND country='CA');

CREATE FUNCTION protect_ssn (ssn VARCHAR(11))
RETURNS VARCHAR(11)
RETURN SELECT CASE WHEN query_user()='dremio' OR is_member('Accounting') THEN ssn
ELSE CONCAT('XXX-XXX-', SUBSTR(ssn,9,3))
END;

CREATE FUNCTION salary_range (salary FLOAT, id INTEGER)
RETURNS BOOLEAN
RETURN SELECT CASE WHEN id > 1 AND salary > 10000 THEN true
ELSE false
END;

Using STRUCT

Use STRUCT
--
CREATE TABLE struct_demo (emp_info struct <name : VARCHAR>);
INSERT INTO nas.struct_demo VALUES(SELECT convert_from('{"name":"a"}', 'json'));
CREATE FUNCTION hello(nameCol struct<name:VARCHAR>) RETURNS struct<name:VARCHAR> RETURN SELECT nameCol;
ALTER TABLE nas.struct_demo MODIFY COLUMN emp_info SET MASKING POLICY "@dremio".hello(emp_info);

Using List

Use list
CREATE FUNCTION hello_country(countryList LIST<VARCHAR>) RETURNS VARCHAR RETURN SELECT 'Hello World';
ALTER TABLE "test.json" MODIFY COLUMN country SET MASKING POLICY "@dremio".hello_country(country);

Row-Access

Using Simple Filter Expressions

Use simple filter expressions
CREATE FUNCTION country_filter (country VARCHAR)
RETURNS BOOLEAN
RETURN SELECT state='CA';

Matching Users

Match users
CREATE FUNCTION query_1(my_value varchar)
RETURNS BOOLEAN
RETURN SELECT CASE
WHEN current_user = 'jdoe@dremio.com' THEN true
ELSE false
END;

Table-Driven Policy Using a Subquery

Use a subquery as a table-driven policy
DROP TABLE $scratch.salesmanagerregions;
CREATE TABLE $scratch.salesmanagerregions (
sales_manager varchar,
sales_region varchar
);

INSERT INTO $scratch.salesmanagerregions
VALUES ('john.smith@example.com', 'WW'),
('jane.doe@example.com', 'NA'),
('viktor.jones@example.com', 'EU');

CREATE TABLE $scratch.revenue (
company varchar,
region varchar,
revenue decimal(18,2)
);

INSERT INTO $scratch.revenue
VALUES ('Acme', 'EU', 2.5),
('Acme', 'NA', 1.5);

CREATE OR REPLACE FUNCTION security.sales_policy (sales_region_in varchar) RETURNS BOOLEAN
RETURN SELECT is_member('sales_executive_role')
OR EXISTS (
SELECT 1 FROM $scratch.salesmanagerregions
WHERE user() = sales_manager
AND sales_region = sales_region_in
);

ALTER TABLE $scratch.revenue
ADD ROW ACCESS POLICY security.sales_policy(region);

SELECT * FROM $scratch.revenue;
-- company, region, revenue
-- Acme, NA, 1.50

Using Reflections on Datasets with Policies

Dremio supports reflection creation on views and tables with row-access and column-masking policies defined on any of the underlying anchor datasets. See the following examples.

Example of a view with a row-access policy and a raw reflection
-- Create nested views
CREATE OR REPLACE VIEW myView AS
SELECT city, state, pop FROM Samples."samples.dremio.com"."zips.json"
WHERE pop > 10000;
CREATE OR REPLACE VIEW myView2 AS
SELECT city, state FROM myView
WHERE STARTS_WITH(city, 'A');

-- Create a raw reflection on the inner view
ALTER TABLE myView
CREATE RAW REFLECTION myReflection
USING DISPLAY(city, state);

-- Query the view after the reflection is created
SELECT * FROM myView2;

-- Create a UDF
CREATE OR REPLACE FUNCTION isMA(state VARCHAR)
RETURNS BOOLEAN
RETURN SELECT CASE WHEN IS_MEMBER('hr') THEN state='MA'
ELSE NULL
END;

-- Add a row-access policy and query the view
ALTER TABLE myView
ADD ROW ACCESS POLICY isMA("state");
SELECT * FROM myView2;

After running the last query, the reflection is used to accelerate the query as shown in the results below:

The Query1 results show that the row-access policy has been applied successfully:

The Query2 results do not appear to those who are not members of HR:

The Query2 results appear to those who are members of HR:

Example of a table with a row-access policy and an aggregation reflection
ALTER TABLE NAS.rcac.employee
ADD ROW ACCESS POLICY is_recent_employee(hire_date);
ALTER TABLE NAS.rcac.employee
CREATE AGGREGATE REFLECTION ar_tvrf_1 USING DIMENSIONS(hire_date);
SELECT MIN(SALARY) FROM NAS.rcac.employee
GROUP BY hire_date;

Limitations

See the following limitations where datasets with row-access and/or column-masking policies cannot support reflections:

Eventual Consistency

Keep in mind that row-access and column-masking policies are “eventually consistent” with reflection management, which means if you take any of the following actions:

  • Modify the row access of column masking policy after the reflection has been created

  • Create a default raw reflection on a view with all the columns selected and add the policy on an anchor dataset after the reflection has been created

Then those changes won’t be accounted for until the reflection is refreshed. See the following example:

Example of the limitation
-- Create nested views
CREATE OR REPLACE VIEW myView AS
SELECT city, state, pop FROM Samples."samples.dremio.com"."zips.json"
WHERE pop > 10000;
CREATE OR REPLACE VIEW myView2 AS
SELECT * FROM myView
WHERE STARTS_WITH(city, 'A');

-- Create a raw reflection on the top view
ALTER TABLE myView2
CREATE RAW REFLECTION myReflection
USING DISPLAY(city, state, pop);

-- Query the view after the reflection is created
SELECT * FROM myView2;

-- Create a UDF
CREATE OR REPLACE FUNCTION isMA(state VARCHAR)
RETURNS BOOLEAN
RETURN SELECT CASE WHEN IS_MEMBER('public') THEN state='MA'
ELSE NULL
END;

-- Add a row-access policy and query the view
ALTER TABLE myView ADD ROW ACCESS POLICY isMA("state");
SELECT * FROM myView2;

Notice that the reflection is matched, but the results are stale and the row-access policy hasn't been applied. After you refresh the reflection and rerun this command:

Example of the query to rerun
SELECT * FROM myView2;

You can see the results are consistent and the row-access policy has been applied.

Policies with Multiple Arguments

If a policy on an anchor dataset contains multiple columns, the reflection created on the view containing the policy fails. See the following example:

Example of the limitation
-- Create tables
CREATE TABLE employees (
id INT,
hire_date DATE,
ssn VARCHAR(11),
name VARCHAR,
country VARCHAR,
salary FLOAT,
job_id INT);
CREATE TABLE jobs (
id INT,
title VARCHAR,
is_good BOOLEAN);

-- Create a view
CREATE VIEW job_salary_in_the_usa AS
SELECT job_id, salary
FROM employees
WHERE country = 'USA';

-- Create a UDF
CREATE OR REPLACE FUNCTION hide_salary_on_bad_job(salary FLOAT, job_id_in INT)
RETURNS BOOLEAN
RETURN SELECT CASE WHEN IS_MEMBER('public') AND (
SELECT is_good FROM jobs j WHERE job_id_in = j.id)
THEN NULL
ELSE salary
END;

-- Add a column-masking policy
ALTER TABLE employees
MODIFY COLUMN salary
SET MASKING POLICY hide_salary_on_bad_job(salary, job_id);

-- Create a raw reflection on the view
ALTER DATASET job_salary_in_the_usa
CREATE RAW REFLECTION job_salary_drr USING DISPLAY(job_id, salary);

In the above example, the job_salary_drr reflection fails to materialize due to the multi-argument policy on test.tables.employees::salary.

Aggregates on Masked Columns

You cannot create a raw reflection on the view if there is a policy defined on the masked column.

Example of the limitation
CREATE OR REPLACE VIEW myView AS
SELECT MIN(salary)
FROM employees

In the above example, there is a policy defined on salary, so you cannot create a reflection on this view.

NULL Generating JOINs

You can only apply the policy if it's on the “join side” of the join, such as:

  • Left side of LEFT JOIN
  • Right side of RIGHT JOIN
  • Either side of INNER JOIN
  • Neither side of FULL OUTER JOIN

If the policy is not on the "join side", the join generates NULL values for all the entries that didn’t match the join condition.

Example of the limitation
CREATE OR REPLACE VIEW myView AS
SELECT emp.department_id, dept.department_name, emp.name
FROM employees as emp
RIGHT JOIN department as dept
ON emp.department_id = dept.department_id

In the above example, there is a policy defined on the employees table, which is on the left side of the RIGHT JOIN, so you cannot create a reflection on this view.

SET Operations

The policy must be defined on all UNION datasets and on the same field.

Example of the limitation
CREATE OR REPLACE VIEW myView AS
SELECT * FROM a
UNION SELECT * FROM employees
UNION SELECT * FROM c

In the above example, there is a policy defined on the employees table, so you cannot create a reflection on this view.

Trimming Projects

In order to create a reflection on a view, the view should reference all the fields that are part of the row-access and column-masking policies.

Example of the limitation
-- Create a UDF
CREATE OR REPLACE FUNCTION isMA(state VARCHAR)
RETURNS BOOLEAN
RETURN SELECT CASE WHEN IS_MEMBER('public') THEN state='MA'
ELSE NULL
END;

-- Create views
CREATE OR REPLACE VIEW myView1 AS
SELECT city, state, pop FROM Samples."samples.dremio.com"."zips.json"
WHERE pop > 10000;

-- Add a row-access policy
ALTER TABLE myView1
ADD ROW ACCESS POLICY isMA("state");

-- Create views
CREATE OR REPLACE VIEW myView2 AS
SELECT * FROM myView1;
CREATE OR REPLACE VIEW myView3 AS
SELECT city, pop FROM myView1;

In the above example, you can create a reflection on myView2 but not on myView3 since it trims the state column from the view which has a policy defined on it.