Skip to main content

Accelerate Queries Using Reflections

Queries are fast on the nyc_trips table but queries on this table triggered from dashboards need to be sub-second.

To optimize queries and get sub-second response times on 330+ million rows, you will create a data reflection. Data reflections organize and optimize the data close to Dremio Sonar’s query engine, using techniques such as pre-computation, columnarization, compression, sorting, partitioning, and data aggregation.

Step 1. Query the Table

Before adding a reflection, query the dataset and make a note of the runtime so you can compare it to the runtime after a reflection is in place.

  1. Click This is the icon that represents the SQL runner. in the side navigation bar.

  2. To query the nyc_trips, copy and paste the SQL below and click Run.

    Query the table
    SELECT   DATE_PART('DAY', "catalog_name"."my_folder"."nyc_trips"."pickup_datetime") AS "pickup_day",
    SUM("catalog_name"."my_folder"."nyc_trips"."passenger_count") AS "passenger_count"
    FROM "catalog_name"."my_folder"."nyc_trips"
    GROUP BY pickup_day
    ORDER BY pickup_day;

    Notice that results were returned quickly, but you can further optimize so that your users get sub-second response if similar analytics are done through a BI tool.

    Let’s add an aggregation reflection to the nyc_trips table.

Step 2. Create an Aggregation Reflection

To accelerate your queries, create an aggregation reflection:

  1. Locate the nyc_trips table under my_folder in the Data panel on the left. Hover over the nyc_trips and click on the metadata card.
note

Alternatively, you can navigate to the Datasets page, hover over the "catalog_name"."my_folder"."nyc_trips" and click on the right.

  1. In the Reflections tab, under Aggregation Reflections, remove passenger_count from Dimensions so that pickup_datetime is the only Dimension.

  2. For Measures, add passenger_count to have all columns except for pickup_datetime listed as measures.

  1. Toggle on the Aggregation Reflections and click Save.

    It takes only a few seconds to create this reflection. Creating a reflection on the base table allows views created on top of that data to take advantage of the acceleration.

    note

    The reflection can also be created using the following SQL command:

    ALTER TABLE                 "catalog_name"."my_folder"."nyc_trips"
    CREATE AGGREGATE REFLECTION "taxi_reflection"
    USING DIMENSIONS ("pickup_datetime")
    MEASURES (passenger_count, trip_distance_mi, fare_amount, tip_amount, total_amount);
  2. While you’re waiting for the reflection to be created, click on the history link in the Reflections tab.

This will navigate you to the job that was triggered to create the reflection. Once the reflection has been created, you’ll see the Completed icon to indicate that the job has completed.

Step 3. Re-run the Query to See the Acceleration

Once the reflection has been created, run the query below in the SQL Runner, which takes advantage of the aggregate reflection.

The reflection was created on the nyc_trips table but will also accelerate queries on views built on this table.

When a query like this is triggered from the SQL Runner or a dashboard, data from all 330+ million rows is used to return the result.

Re-run the query on the table
SELECT   DATE_PART('DAY', "catalog_name"."my_folder"."nyc_trips"."pickup_datetime") AS "pickup_day",
SUM("catalog_name"."my_folder"."nyc_trips"."passenger_count") AS "passenger_count"
FROM "catalog_name"."my_folder"."nyc_trips"
GROUP BY pickup_day
ORDER BY pickup_day;

Notice the query runs in less than a second. When you create a reflection on your own data, the reflection will refresh automatically as the dataset gets updated so that the reflection continues to accelerate queries without manual intervention.

Now that you’ve configured a reflection and gotten runtimes to sub-second, see how the same reflection accelerates a dashboard.

Next: Visualize Data Using Tableau or Power BI