Version: current [26.x]

Runtime Filtering

note

Runtime filtering applies only to Parquet and Iceberg datasets.

Runtime filtering improves query performance by reducing the number of records that need to be processed during a join. This is achieved by creating a filter from the join keys and applying it dynamically to the scanned (probed) side of the join. Runtime filters can be applied to both partition and non-partition columns. They are most effective when the set of join key values have a low cardinality compared to the number of records in the probed table.

Runtime Filtering Example

Suppose you have a large fact table named sales that records all the sales from an E-commerce site and is partitioned on the order_date column. You also have a smaller unpartitioned dimension table named customers that includes a signup_date column. The sales table has hundreds of millions of records, whereas customers has tens of millions.

The following example demonstrates how to join the sales and customers tables on the partitioned and non-partitioned columns.

Fact and dimension tables joined on partitioned and non-partitioned columns

SELECT
   s.*
FROM
   sales s
JOIN
   customers c
   ON (s.customer_id = c.customer_id AND s.order_data = c.signup_date)
WHERE c.signup_date ≥ DATE ‘2023-12-01’

This query will have effective runtime filtering because it only considers customers who have signed up since December 2023, so the set of join key values in the filter will be relatively small. Also, the query only concerns orders made on the same day a customer signed up, so entire order_date partitions in sales will be filtered out.