Reflection Recommendations Enterprise
Recommendations allow you to obtain effective reflections that accelerate your queries in Dremio. You can use recommendations that are automatically generated based on job history or manually requested for a specific set of jobs.
Viewing Automatically Generated Recommendations
When Autonomous Reflections are not enabled, Dremio automatically provides recommendations to add and remove reflections based on query patterns to optimize performance for queries on Iceberg tables, Parquet datasets, and views based on them. Recommendations to add reflections are sorted by overall effectiveness, with the most effective recommendations shown on top. Effectiveness relates to metrics such as the estimated number of accelerated jobs, potential increase in query execution speedup, and potential time saved during querying. These are rough estimates based on past data that can give you insight into the potential benefits of each recommendation. Reflections created using these recommendations refresh automatically when the source data changes on:
- Iceberg tables: when the table is modified through Dremio or other engines. Dremio polls tables every 10 seconds.
- Parquet datasets: when metadata is updated in Dremio.
To view and apply reflection recommendations:
- Navigate to Project Settings.
- Select Reflections.
- Click Reflections Recommendations to access the list of suggested reflections.
- To apply a recommendation, click on the plus icon at the end of the corresponding row.
Reflections created using usage-based recommendations are only used when fully synchronized with their source data to ensure up-to-date query results.
Manually Requesting Recommendations for Specific Jobs
You can generate recommendations for default raw reflections by submitting job IDs to accelerate specific SQL queries using either the SYS.RECOMMEND_REFLECTIONS table function or the Recommendations API. Obtain the job IDs by looking them up on the Jobs page.
Running the SYS.RECOMMEND_REFLECTIONS Table Function
The SYS.RECOMMEND_REFLECTIONS table function returns a table of one or more recommendations.
Syntax
Syntax for the SYS.RECOMMEND_REFLECTIONS table functionSELECT * FROM TABLE(SYS.RECOMMEND_REFLECTIONS(ARRAY['<jobId>', '<jobId>']))
The SYS.RECOMMEND_REFLECTIONS function's argument must be an array literal.
Parameters
The SYS.RECOMMEND_REFLECTIONS table function has one parameter: an array that contains the job IDs for the SQL queries that you want to accelerate.
You can list up to 100 job IDs in each SYS.RECOMMEND_REFLECTIONS query. You must have permission to view every job you list in the query.
Output
The output for SYS.RECOMMEND_REFLECTIONS queries is a table that includes the following columns:
Column Name | Data Type | Description |
---|---|---|
view_sql | varchar | If the recommendation is for a default raw reflection: The value in this field is --Default Raw Reflection .If the recommendation is for an aggregation reflection: An SQL command that creates a view based on the initial SQL query in the referenced job ID. Dremio uses this view to create an aggregation reflection when you run the SQL command from the reflection_sql column. Before running this command, ensure that you have already created an Arctic catalog or folder named recommended_view . |
reflection_sql | varchar | If the recommendation is for a default raw reflection: An SQL command for defining the reflection on an existing view. If the recommendation is for an aggregation reflection: An SQL command that alters and adds an aggregation reflection to the view that is created when you run the SQL command in the view_sql column. Before running this command, ensure that you have run the corresponding command in the view_sql column.Dremio automatically consolidates recommendations for aggregation reflections when possible to create a single recommendation for similar SQL queries, so the output table may not contain an individual row that corresponds to each job ID listed in the SYS.RECOMMEND_REFLECTIONS query. |
job_ids | list | The unique identifier of the jobs that ran the queries for which the recommendation is given. |
reflection_score | double | Dremio's score for the recommended reflection's quality, on a scale of 0 (worst) to 100 (best). The reflection_score value considers the recommended reflection's anticipated quality compared to existing reflections and other recommended reflections, as well as the likely improvement in query run times if the recommended reflection is implemented. |
average_improvement_factor | double | The likely average multiplicative rate of improvement for each query if you implement the recommended reflection. For example, if the average_improvement_factor value is 2.34, implementing the recommended reflection is likely to speed up each query by 2.34 times, on average. |
average_improvement_ms | double | The likely average improvement, in milliseconds, for each query if you implement the recommended reflection. For example, if the average_improvement_ms value is 5400, implementing the recommended reflection is likely to save an average of 5400 milliseconds for each query that uses the reflection. |
Example Query and Output
To request a recommendation, you must have the job ID for at least one SQL query that you want to accelerate. For example, suppose this is the SQL query:
Example SQL QuerySELECT Max("p_size")
FROM s3.bucket1.tpch.sf10.parquet.part
JOIN s3.bucket1.tpch.sf10.parquet.lineitem
ON "p_partkey" = "l_partkey"
GROUP BY "p_brand"
Obtaining Recommendations
-
Retrieve the job ID for the query.
-
Run the SYS.RECOMMEND_REFLECTIONS query with the job ID. In the following example, the job ID is
Example SYS.RECOMMEND_REFLECTIONS Query844c0023-6272-8b16-aef3-aea289acadb1
:SELECT * FROM TABLE(SYS.RECOMMEND_REFLECTIONS(ARRAY['844c0023-6272-8b16-aef3-aea289acadb1']))
The result of the query is returned in this table:
Example Resultsview_sql | reflection_sql | job_ids | reflection_score | average_improvement_factor | average_improvement_ms |
---|---|---|---|---|---|
CREATE VIEW "recommended_view"."Dataset_9d74a03b-747a-42a2-a5ca-7f9c6f77b55d" AS SELECT "part"."P_BRAND" AS "F0[P_BRAND]", "part"."P_SIZE" AS "F1[P_SIZE]", "part"."P_PARTKEY" AS "extra#0", "part"."P_NAME" AS "extra#1", "part"."P_MFGR" AS "extra#2", "part"."P_TYPE" AS "extra#4", "part"."P_CONTAINER" AS "extra#6", "part"."P_RETAILPRICE" AS "extra#7", "part"."P_COMMENT" AS "extra#8", "lineitem"."L_ORDERKEY" AS "extra#9", "lineitem"."L_PARTKEY" AS "extra#10", "lineitem"."L_SUPPKEY" AS "extra#11", "lineitem"."L_LINENUMBER" AS "extra#12", "lineitem"."L_QUANTITY" AS "extra#13", "lineitem"."L_EXTENDEDPRICE" AS "extra#14", "lineitem"."L_DISCOUNT" AS "extra#15", "lineitem"."L_TAX" AS "extra#16", "lineitem"."L_RETURNFLAG" AS "extra#17", "lineitem"."L_LINESTATUS" AS "extra#18", "lineitem"."L_SHIPDATE" AS "extra#19", "lineitem"."L_COMMITDATE" AS "extra#20", "lineitem"."L_RECEIPTDATE" AS "extra#21", "lineitem"."L_SHIPINSTRUCT" AS "extra#22", "lineitem"."L_SHIPMODE" AS "extra#23", "lineitem"."L_COMMENT" AS "extra#24" FROM "s3"."bucket1"."tpch"."sf10"."parquet"."lineitem" INNER JOIN "s3"."bucket1"."tpch"."sf10"."parquet"."part" ON "part"."P_PARTKEY" = "lineitem"."L_PARTKEY" | ALTER DATASET "recommended_view"."Dataset_9d74a03b-747a-42a2-a5ca-7f9c6f77b55d" CREATE AGGREGATE REFLECTION "agg_0e07a376-7f8e-4c68-b2ce-6f6e819bebe6" USING DIMENSIONS ("F0[P_BRAND]") MEASURES ("F1[P_SIZE]" (MAX)) | "6j6c34cf-9drf-b07a-5ab7-abea69a66d00", "1a3c67c0-aab0-f9fb-97b4-af374b520100", "1a3c67c0-db35-3645-9ef1-2a84e4d0ce00" | 50.00 | 10.00 | 7196 |
Using the Recommendation
-
If a recommendation is for an aggregation reflection:
- Create an Arctic catalog or folder named
recommended_view
. - Run the query listed in the
view_sql
column to create a view based on the initial SQL query.
- Create an Arctic catalog or folder named
-
Run the query listed in the
reflection_sql
column to create the recommended reflection and wait for the reflection to finish refreshing. -
When the reflection refresh is complete, run the SQL query to observe the acceleration due to the added reflection.
Limitations
The SYS.RECOMMEND_REFLECTIONS table function has the following limitations:
- The function cannot provide recommendations for jobs with multiple nested queries that contain joins.
- An SQL query can contain only inner joins. Outer joins that are part of a view definition are also supported. Other types of joins are not supported.
- An SQL query cannot contain Window functions.
Sending Requests to the Recommendations API
You can use the Recommendations API to submit the job IDs of jobs that ran SQL queries and receive recommendations for reflections that can accelerate those queries.
- For default raw reflections, each recommendation comprises the path to the view on which to define the reflection and the parameters to use in a request to create the reflection.
- For aggregation reflections, each recommendation comprises the parameters to use in a request to create a view on which to define the recommended reflection and the parameters to use in a request to create the reflection.
After you retrieve the recommended reflections for your queries, use the Catalog API to create the recommended views. Then, use the Reflection API to create the desired reflections.
Before submitting Catalog API requests to create the recommended views for aggregation reflections, create a folder named recommended_view
. In your Catalog API requests, the path
parameter must include the full path to the recommended_view
folder.
Read the Recommendations API reference for more information.