Skip to main content
Version: 24.3.x

Dremio-Native: Row-Access & Column-Masking Policies

Row-access and column-masking policies may be applied to tables, views, and columns via user-defined functions (UDFs).

This allows you to filter sensitive data based upon the rules and conditions you need to maintain compliance or adhere to regulatory requirements, while also removing the need to produce a secondary set of data with protected information manually removed.

note

Dremio only supports one data governance policy manager at a time, so you can use either Dremio or Ranger as a policy manager but not both at the same time.

When adding a new Hive source, you have the following options for Hive authorization clients:

  • Storage-Based with User Impersonation
  • SQL-Based
  • Ranger-Based

Requirements

The following source types are supported:

  • Object Storage - S3, Azure Storage, GCS, HDFS
  • Metastores - AWS Glue, Hive Metastore
  • Databases - Oracle, SQL Server, etc.

The following restrictions apply:

  • Only users with the ADMIN role may create UDFs
  • A function may only have one owner (the user that created the UDF, by default), which may be transferred using the GRANT OWNERSHIP command
  • The owner of a UDF that serves as a policy must have the EXECUTE privilege for that UDF.
  • Use of the is_member function shown in the examples below is not currently available to organizations using SSO role authentication.

Column-Masking

Column-masking is a way to mask-–or scramble–-private data at the column-level dynamically prior to query execution. For example, the owner of a table or view may apply a policy to a column to only display the year of a date or the last four digits of a credit card.

Column-masking policies may be any UDF with a scalar return type that is identical to the data type of the column on which it is applied. However, only one column-masking policy may be applied to each column.

In the following example of a user-defined function, only users within in the Accounting department in the state of California (CA) may see an entry's social security number (ssn) if the record lists an income above $10,000, otherwise the SSN value is masked with XXX-XX-.

Column-masking
CREATE FUNCTION protect_ssn (ssn VARCHAR(11))
RETURNS VARCHAR(11)
RETURN SELECT CASE WHEN query_user()='jdoe@dremio.com' OR is_member('Accounting') THEN ssn
ELSE CONCAT('XXX-XXX-', SUBSTR(ssn,9,3))
END;

Row-Access

Row-access policies are a way to control which records in a table or view are returned for specific users and roles. For example, the owner of a table or view may apply a policy that filters out customers from a specific country unless the user running the query has a specific role.

Row-access
CREATE FUNCTION country_filter (country VARCHAR)
RETURNS BOOLEAN
RETURN SELECT query_user()='jdoe@dremio.com' OR (is_member('Accounting') AND country='CA');

Row-access policies may be any boolean UDF applied to the table or view. The return value of the UDF is treated logically in a query as an AND operator included in a WHERE clause. The return type of the UDF must be BOOLEAN, otherwise Dremio will give an error at execution time.

User-Defined Functions

A user-defined function, or UDF, is a callable routine that accepts input parameters, executes the function body, and returns a single value or a set of rows.

The UDFs which serve as the basis for filtering and masking policies must be defined independently of your sources. Not only does this allow organizations to use a single policy for multiple tables and views, but this also restricts user access to policies and prevents unauthorized tampering. Modifying a single UDF automatically updates the policy in the context of any tables or views using that access or mask policy.

The following process describes how policies are enforced with Dremio:

  1. A user with the ADMIN role creates a UDF to serve as a security policy.
  2. The administrator then sets the security policy to one or more tables, views, and/or columns.
  3. Dremio enforces the policy at runtime when an end-user performs a query.

Creating UDFs and attaching security policies is done through SQL commands. Policies are applied prior to execution during the query planning phase. At this point, Dremio checks first the table/view for a row-access policy and then each column accessed for a column-masking policy. If any policies are found, they are automatically applied to the policy's scope using the associated UDF in the query plan.

Query Substitutions

Row-access and column-masking function act as an "implicit view," replacing a table/view reference in an SQL statement prior to processing the query. This implicit view is created through an examination of each policy applied to a table, view, or column.

For example, jdoe@dremio.com has SELECT access to table_1. However, the column-masking policy protect_ssn is set for the column_1 column with a UDF to replace all but the last four digits of a social security number with X for anyone that is not a member of the Accounting department, or this user. When they run a query in Dremio that includes this column-masking policy, the following occurs:

  1. During the SQL Planning phase, Dremio identifies which tables, views, and columns are being accessed (table_1) and whether security policies must be enforced.
  2. The engine searches for any security policies set to the associated objects, such as protect_ssn (see Examples of UDFs below).
  3. When the protect_ssn policy is found for the object affected by the query, the query planner immediately modifies the execution path to incorporate the masking function.
  4. Query execution proceeds as normal with the associated UDF included within the execution path.

Listing Existing UDFs

To view all existing UDFs created in Dremio, use the SHOW FUNCTIONS SQL command.

Listing Existing Policies

To view row-access and column-masking policies, use a SELECT statement with the target table/view, system table, and policies specified.

List existing policies
SELECT view_name, masking_policies, row_access_policies FROM sys.views;
SELECT table_name, masking_policies, row_access_policies FROM sys."tables";

To view all column-masking policies set for a given table, use the DESCRIBE TABLE command.

Setting a Policy

To create a row-access or column-masking policy, you must perform the following steps using the associated SQL commands:

  1. Create a new UDF or replace an existing one using the CREATE [OR REPLACE] FUNCTION command.

    Create or replace UDF
    CREATE FUNCTION country_filter (country VARCHAR)
    RETURNS BOOLEAN
    RETURN SELECT query_user()='jdoe@dremio.com' OR (is_member('Accounting') AND country='CA');

    CREATE FUNCTION id_filter (id INT)
    RETURNS BOOLEAN
    RETURN SELECT id = 1;
  2. Grant the EXECUTE privilege to the owner of the UDF.

    Grant EXECUTE privilege
    GRANT EXECUTE ON FUNCTION country_filter TO user 'UDF_owner@dremio.com';
  3. Create a policy to apply the function using either ADD ROW ACCESS POLICY for row-level access or SET MASKING POLICY for column-masking. These may be used with the CREATE TABLE, CREATE VIEW, ALTER TABLE, and ALTER VIEW commands.

    Create policy to apply function
    -- Add row-access policy
    ALTER TABLE e.employee
    ADD ROW ACCESS POLICY country_filter(country);

    -- Add column-masking policy
    ALTER VIEW e.employee_view
    SET MASKING POLICY protect_ssn (ssn_col, region);

    -- Create table with row policy
    CREATE TABLE e.employee(
    id INTEGER,
    ssn VARCHAR(11),
    country VARCHAR,
    ROW ACCESS POLICY country_filter(country)
    );

    -- Create table with masking policy
    CREATE VIEW e.employee_view(
    ssn_col VARCHAR MASKING POLICY protect_ssn (ssn_col, region),
    region VARCHAR,
    state_col VARCHAR)
    );
note

Both row-access and column-masking UDFs may be applied in a single security policy, or set individually.

Dropping a Policy

To remove a security policy from a table, view, or row, use the UNSET MASKING POLICY or DROP ROW ACCESS POLICY syntax with ALTER TABLE/VIEW.

Drop policy
ALTER TABLE w.employee DROP ROW ACCESS POLICY country_filter(country);
ALTER VIEW e.employees_view MODIFY COLUMN ssn_col UNSET MASKING POLICY protect_ssn;

Examples of UDFs

The following are examples of user-defined functions that you may create with Dremio.

Column-Masking

Redacting SSN

Redact SSN
CREATE FUNCTION
protect_ssn (val VARCHAR)
RETURNS VARCHAR
RETURN
SELECT
CASE
WHEN query_user() IN ('jdoe@dremio.com','janders@dremio.com')
OR is_member('Accounting') THEN val
ELSE CONCAT('XXX-XX-',SUBSTR(val,8,4))
END;

Using Masking & Access Policies

Use masking and access policies
CREATE FUNCTION lower_country(country VARCHAR)
RETURNS VARCHAR
RETURN SELECT lower(country);

CREATE FUNCTION country_filter (country VARCHAR)
RETURNS BOOLEAN
RETURN SELECT query_user()='dremio'
OR (is_member('Accounting')
AND country='CA');

CREATE FUNCTION protect_ssn (ssn VARCHAR(11))
RETURNS VARCHAR(11)
RETURN SELECT CASE WHEN query_user()='dremio' OR is_member('Accounting') THEN ssn
ELSE CONCAT('XXX-XXX-', SUBSTR(ssn,9,3))
END;

CREATE FUNCTION salary_range (salary FLOAT, id INTEGER)
RETURNS BOOLEAN
RETURN SELECT CASE WHEN id > 1 AND salary > 10000 THEN true
ELSE false
END;

Using STRUCT

Use STRUCT
-- 
CREATE TABLE struct_demo (emp_info struct <name : VARCHAR>);
INSERT INTO nas.struct_demo VALUES(SELECT convert_from('{"name":"a"}', 'json'));
CREATE FUNCTION hello(nameCol struct<name:VARCHAR>) RETURNS struct<name:VARCHAR> RETURN SELECT nameCol;
ALTER TABLE nas.struct_demo MODIFY COLUMN emp_info SET MASKING POLICY "@dremio".hello(emp_info);

Using List

Use list
CREATE FUNCTION hello_country(countryList LIST<VARCHAR>) RETURNS VARCHAR RETURN SELECT 'Hello World';
ALTER TABLE "test.json" MODIFY COLUMN country SET MASKING POLICY "@dremio".hello_country(country);

Row-Access

Using Simple Filter Expressions

Use simple filter expressions
CREATE FUNCTION country_filter (country VARCHAR)
RETURNS BOOLEAN
RETURN SELECT state='CA';

Matching Users

Match users
CREATE FUNCTION query_1(my_value varchar)
RETURNS BOOLEAN
RETURN SELECT CASE
WHEN current_user = 'jdoe@dremio.com' THEN true
ELSE false
END;

Table-Driven Policy Using a Subquery

Use a subquery as a table-driven policy
DROP TABLE <catalog-name>.salesmanagerregions;
CREATE TABLE <catalog-name>.salesmanagerregions (
sales_manager varchar,
sales_region varchar
);

INSERT INTO <catalog-name>.salesmanagerregions
VALUES ('john.smith@example.com', 'WW'),
('jane.doe@example.com', 'NA'),
('viktor.jones@example.com', 'EU');

CREATE TABLE <catalog-name>.revenue (
company varchar,
region varchar,
revenue decimal(18,2)
);

INSERT INTO <catalog-name>.revenue
VALUES ('Acme', 'EU', 2.5),
('Acme', 'NA', 1.5);

CREATE OR REPLACE FUNCTION security.sales_policy (sales_region_in varchar) RETURNS BOOLEAN
RETURN SELECT is_member('sales_executive_role')
OR EXISTS (
SELECT 1 FROM <catalog-name>.salesmanagerregions
WHERE user() = sales_manager
AND sales_region = sales_region_in
);

ALTER TABLE <catalog-name>.revenue
ADD ROW ACCESS POLICY security.sales_policy(region);

SELECT * FROM <catalog-name>.revenue;
-- company, region, revenue
-- Acme, NA, 1.50