Runtime Filtering
Runtime filtering applies only to Parquet and Iceberg datasets.
Runtime filtering improves query performance by reducing the number of records that need to be processed during a join. This is achieved by creating a filter from the join keys and applying it dynamically to the scanned (probed) side of the join. Runtime filters can be applied to both partition and non-partition columns. They are most effective when the set of join key values have a low cardinality compared to the number of records in the probed table.
Runtime Filtering Example
Suppose you have a large fact table named sales
that records all the sales from an E-commerce site and is partitioned on the order_date
column. You also have a smaller unpartitioned dimension table named customers
that includes a signup_date
column. The sales
table has hundreds of millions of records, whereas customers
has tens of millions.
The following example demonstrates how to join the sales
and customers
tables on the partitioned and non-partitioned columns.
SELECT
s.*
FROM
sales s
JOIN
customers c
ON (s.customer_id = c.customer_id AND s.order_data = c.signup_date)
WHERE c.signup_date ≥ DATE ‘2023-12-01’
This query will have effective runtime filtering because it only considers customers who have signed up since December 2023, so the set of join key values in the filter will be relatively small. Also, the query only concerns orders made on the same day a customer signed up, so entire order_date
partitions in sales
will be filtered out.