Reflection Recommendations Enterprise
Recommendations allow you to obtain effective reflections that accelerate your queries in Dremio. You can use recommendations that are automatically generated based on job history or manually requested for a specific set of jobs.
Viewing Automatically Generated Recommendations
To accelerate query performance on the most frequently accessed views, Dremio generates a list of recommendations each day based on the jobs that have been executed in the last 7 days.
Dremio suggests up to 10 recommendations at a given time. The number of suggested recommendations varies depending on the data Dremio collects. If Dremio does not have relevant recommendations based on collected data, the list of recommendations is empty.
To view a list of automatically generated recommendations:
- On the Settings page, click Reflections.
- To view a list of recommendations, click in the upper-right corner.
- Click at the end of a row to trigger Dremio to create the reflection. You can monitor the reflection status on the Reflections page.
Recommendations are sorted by overall effectiveness, with the most effective recommendations shown on top. Effectiveness relates to metrics such as the estimated number of accelerated jobs, potential increase in query execution speedup, and potential time saved during querying. These are rough estimates based on past data that can give you insight into the potential benefits of each recommendation.
Manually Requesting Recommendations for Specific Jobs
You can generate recommendations for default raw reflections by submitting job IDs to accelerate specific SQL queries using either the SYS.RECOMMEND_REFLECTIONS table function or the Recommendations API. Obtain the job IDs by looking them up on the Jobs page.
Running the SYS.RECOMMEND_REFLECTIONS Table Function
The SYS.RECOMMEND_REFLECTIONS table function returns a table of one or more recommendations.
Syntax
Syntax for the SYS.RECOMMEND_REFLECTIONS table functionSELECT * FROM TABLE(SYS.RECOMMEND_REFLECTIONS(ARRAY['<jobId>', '<jobId>']))
The SYS.RECOMMEND_REFLECTIONS function's argument must be an array literal.
Parameters
The SYS.RECOMMEND_REFLECTIONS table function has one parameter: an array that contains the job IDs for the SQL queries that you want to accelerate.
You can list up to 100 job IDs in each SYS.RECOMMEND_REFLECTIONS query. You must have permission to view every job you list in the query.
Output
The output for SYS.RECOMMEND_REFLECTIONS queries is a table that includes the following columns:
Column Name | Data Type | Description |
---|---|---|
view_sql | varchar | If the recommendation is for a default raw reflection: The value in this field is --Default Raw Reflection .If the recommendation is for an aggregation reflection: An SQL command that creates a view based on the initial SQL query in the referenced job ID. Dremio uses this view to create an aggregation reflection when you run the SQL command from the reflection_sql column. Before running this command, ensure that you have already created an Arctic catalog or folder named recommended_view . |
reflection_sql | varchar | If the recommendation is for a default raw reflection: An SQL command for defining the reflection on an existing view. If the recommendation is for an aggregation reflection: An SQL command that alters and adds an aggregation reflection to the view that is created when you run the SQL command in the view_sql column. Before running this command, ensure that you have run the corresponding command in the view_sql column.Dremio automatically consolidates recommendations for aggregation reflections when possible to create a single recommendation for similar SQL queries, so the output table may not contain an individual row that corresponds to each job ID listed in the SYS.RECOMMEND_REFLECTIONS query. |
job_ids | list | The unique identifier of the jobs that ran the queries for which the recommendation is given. |
reflection_score | double | Dremio's score for the recommended reflection's quality, on a scale of 0 (worst) to 100 (best). The reflection_score value considers the recommended reflection's anticipated quality compared to existing reflections and other recommended reflections, as well as the likely improvement in query run times if the recommended reflection is implemented. |
average_improvement_factor | double | The likely average multiplicative rate of improvement for each query if you implement the recommended reflection. For example, if the average_improvement_factor value is 2.34, implementing the recommended reflection is likely to speed up each query by 2.34 times, on average. |
average_improvement_ms | double | The likely average improvement, in milliseconds, for each query if you implement the recommended reflection. For example, if the average_improvement_ms value is 5400, implementing the recommended reflection is likely to save an average of 5400 milliseconds for each query that uses the reflection. |
Example Query and Output
To request a recommendation, you must have the job ID for at least one SQL query that you want to accelerate. For example, suppose this is the SQL query:
Example SQL QuerySELECT Max("p_size")
FROM s3.bucket1.tpch.sf10.parquet.part
JOIN s3.bucket1.tpch.sf10.parquet.lineitem
ON "p_partkey" = "l_partkey"
GROUP BY "p_brand"
Obtaining Recommendations
-
Retrieve the job ID for the query.
-
Run the SYS.RECOMMEND_REFLECTIONS query with the job ID. In the following example, the job ID is
Example SYS.RECOMMEND_REFLECTIONS Query844c0023-6272-8b16-aef3-aea289acadb1
:SELECT * FROM TABLE(SYS.RECOMMEND_REFLECTIONS(ARRAY['844c0023-6272-8b16-aef3-aea289acadb1']))
The result of the query is returned in this table:
Example Resultsview_sql | reflection_sql | job_ids | reflection_score | average_improvement_factor | average_improvement_ms |
---|---|---|---|---|---|
CREATE VIEW "recommended_view"."Dataset_9d74a03b-747a-42a2-a5ca-7f9c6f77b55d" AS SELECT "part"."P_BRAND" AS "F0[P_BRAND]", "part"."P_SIZE" AS "F1[P_SIZE]", "part"."P_PARTKEY" AS "extra#0", "part"."P_NAME" AS "extra#1", "part"."P_MFGR" AS "extra#2", "part"."P_TYPE" AS "extra#4", "part"."P_CONTAINER" AS "extra#6", "part"."P_RETAILPRICE" AS "extra#7", "part"."P_COMMENT" AS "extra#8", "lineitem"."L_ORDERKEY" AS "extra#9", "lineitem"."L_PARTKEY" AS "extra#10", "lineitem"."L_SUPPKEY" AS "extra#11", "lineitem"."L_LINENUMBER" AS "extra#12", "lineitem"."L_QUANTITY" AS "extra#13", "lineitem"."L_EXTENDEDPRICE" AS "extra#14", "lineitem"."L_DISCOUNT" AS "extra#15", "lineitem"."L_TAX" AS "extra#16", "lineitem"."L_RETURNFLAG" AS "extra#17", "lineitem"."L_LINESTATUS" AS "extra#18", "lineitem"."L_SHIPDATE" AS "extra#19", "lineitem"."L_COMMITDATE" AS "extra#20", "lineitem"."L_RECEIPTDATE" AS "extra#21", "lineitem"."L_SHIPINSTRUCT" AS "extra#22", "lineitem"."L_SHIPMODE" AS "extra#23", "lineitem"."L_COMMENT" AS "extra#24" FROM "s3"."bucket1"."tpch"."sf10"."parquet"."lineitem" INNER JOIN "s3"."bucket1"."tpch"."sf10"."parquet"."part" ON "part"."P_PARTKEY" = "lineitem"."L_PARTKEY" | ALTER DATASET "recommended_view"."Dataset_9d74a03b-747a-42a2-a5ca-7f9c6f77b55d" CREATE AGGREGATE REFLECTION "agg_0e07a376-7f8e-4c68-b2ce-6f6e819bebe6" USING DIMENSIONS ("F0[P_BRAND]") MEASURES ("F1[P_SIZE]" (MAX)) | "6j6c34cf-9drf-b07a-5ab7-abea69a66d00", "1a3c67c0-aab0-f9fb-97b4-af374b520100", "1a3c67c0-db35-3645-9ef1-2a84e4d0ce00" | 50.00 | 10.00 | 7196 |
Using the Recommendation
-
If a recommendation is for an aggregation reflection:
- Create an Arctic catalog or folder named
recommended_view
. - Run the query listed in the
view_sql
column to create a view based on the initial SQL query.
- Create an Arctic catalog or folder named
-
Run the query listed in the
reflection_sql
column to create the recommended reflection and wait for the reflection to finish refreshing. -
When the reflection refresh is complete, run the SQL query to observe the acceleration due to the added reflection.
Limitations
The SYS.RECOMMEND_REFLECTIONS table function has the following limitations:
- The function cannot provide recommendations for jobs with multiple nested queries that contain joins.
- An SQL query can contain only inner joins. Outer joins that are part of a view definition are also supported. Other types of joins are not supported.
- An SQL query cannot contain Window functions.
Sending Requests to the Recommendations API
You can use the Recommendations API to submit the job IDs of jobs that ran SQL queries and receive recommendations for reflections that can accelerate those queries.
- For default raw reflections, each recommendation comprises the path to the view on which to define the reflection and the parameters to use in a request to create the reflection.
- For aggregation reflections, each recommendation comprises the parameters to use in a request to create a view on which to define the recommended reflection and the parameters to use in a request to create the reflection.
After you retrieve the recommended reflections for your queries, use the Catalog API to create the recommended views. Then, use the Reflection API to create the desired reflections.
Before submitting Catalog API requests to create the recommended views for aggregation reflections, create a folder named recommended_view
. In your Catalog API requests, the path
parameter must include the full path to the recommended_view
folder.
Read the Recommendations API reference for more information.