Accelerate Queries Using Reflections
Queries are fast on the nyc_trips
table but queries on this table triggered from dashboards need to be sub-second.
To optimize queries and get sub-second response times on 330+ million rows, you will create a data reflection. Data reflections organize and optimize the data close to Dremio Sonar’s query engine, using techniques such as pre-computation, columnarization, compression, sorting, partitioning, and data aggregation.
Step 1. Query the Table
Before adding a reflection, query the dataset and make a note of the runtime so you can compare it to the runtime after a reflection is in place.
-
Click in the side navigation bar.
-
To query the
Query the tablenyc_trips
, copy and paste the SQL below and click Run.SELECT DATE_PART('DAY', "catalog_name"."my_folder"."nyc_trips"."pickup_datetime") AS "pickup_day",
SUM("catalog_name"."my_folder"."nyc_trips"."passenger_count") AS "passenger_count"
FROM "catalog_name"."my_folder"."nyc_trips"
GROUP BY pickup_day
ORDER BY pickup_day;Notice that results were returned quickly, but you can further optimize so that your users get sub-second response if similar analytics are done through a BI tool.
Let’s add an aggregation reflection to the
nyc_trips
table.
Step 2. Create an Aggregation Reflection
To accelerate your queries, create an aggregation reflection:
- Locate the
nyc_trips
table undermy_folder
in the Data panel on the left. Hover over thenyc_trips
and click on the metadata card.
Alternatively, you can navigate to the Datasets page, hover over the "catalog_name"."my_folder"."nyc_trips"
and click on the right.
-
In the Reflections tab, under Aggregation Reflections, remove
passenger_count
from Dimensions so thatpickup_datetime
is the only Dimension. -
For Measures, add
passenger_count
to have all columns except forpickup_datetime
listed as measures.
-
Toggle on the Aggregation Reflections and click Save.
It takes only a few seconds to create this reflection. Creating a reflection on the base table allows views created on top of that data to take advantage of the acceleration.
noteThe reflection can also be created using the following SQL command:
ALTER TABLE "catalog_name"."my_folder"."nyc_trips"
CREATE AGGREGATE REFLECTION "taxi_reflection"
USING DIMENSIONS ("pickup_datetime")
MEASURES (passenger_count, trip_distance_mi, fare_amount, tip_amount, total_amount); -
While you’re waiting for the reflection to be created, click on the history link in the Reflections tab.
This will navigate you to the job that was triggered to create the reflection. Once the reflection has been created, you’ll see the Completed icon to indicate that the job has completed.
Step 3. Re-run the Query to See the Acceleration
Once the reflection has been created, run the query below in the SQL Runner, which takes advantage of the aggregate reflection.
The reflection was created on the nyc_trips
table but will also accelerate queries on views built on this table.
When a query like this is triggered from the SQL Runner or a dashboard, data from all 330+ million rows is used to return the result.
Re-run the query on the tableSELECT DATE_PART('DAY', "catalog_name"."my_folder"."nyc_trips"."pickup_datetime") AS "pickup_day",
SUM("catalog_name"."my_folder"."nyc_trips"."passenger_count") AS "passenger_count"
FROM "catalog_name"."my_folder"."nyc_trips"
GROUP BY pickup_day
ORDER BY pickup_day;
Notice the query runs in less than a second. When you create a reflection on your own data, the reflection will refresh automatically as the dataset gets updated so that the reflection continues to accelerate queries without manual intervention.
Now that you’ve configured a reflection and gotten runtimes to sub-second, see how the same reflection accelerates a dashboard.