Reflections
A reflection is an optimized materialization of source data or a query, similar to a materialized view, that is derived from an existing virtual or physical dataset. For more information about reflections, see Accelerating Queries with Reflections.
Creating Raw Reflections
ALTER DATASET <dataset_path>
CREATE RAW REFLECTION <reflection_name>
USING
DISPLAY (
<field_name>,
<field_name>,
...
)
[PARTITION BY (<field_name>, <field_name>, ...)]
[LOCALSORT BY (<field_name>, <field_name>, ...)]
[ARROW CACHE]
Parameters
<dataset_path>
String
The path of the virtual or physical dataset that the new raw reflection will be based on.
<reflection_name>
String
The name to give to the new reflection.
DISPLAY (<field_name>, <field_name>, ...)
String
The fields to include in the reflection.
PARTITION BY (<field_name>, <field_name>, ...)
String
(Optional) The fields on which to partition the data horizontally in the reflection.
LOCALSORT BY (<field_name>, <field_name>, ...)
String
(Optional) The fields on which to sort the data that is in the reflection.
ARROW CACHE
(Optional) Specifies that you want Dremio to convert data from your reflection’s Parquet files to the Apache Arrow format when copying that data to executor nodes. Normally, Dremio copies data as-is from the Parquet files as-is to caches on executor nodes, which are nodes that carry out the query plans devised by the query optimizer. Enabling this option can improve query performance even more. However, data in the Apache Arrow format requires more space on the executor nodes than data in the default format. You can use this option with the following types of distributed data storage:
- Amazon Simple Cloud Storage (S3)
- S3-compatible object storage
- HDFS
- Microsoft Azure Data Lake Storage
- Microsoft Azure Storage
Example
ALTER DATASET "@user1"."customers"
CREATE RAW REFLECTION customers_by_country
USING
DISPLAY (
id,
lastName,
firstName,
address,
country
)
LOCALSORT BY (lastName)
PARTITION BY (country)
Creating Aggregation Reflections
ALTER TABLE <dataset_path>
CREATE AGGREGATE REFLECTION <reflection_name>
USING
DIMENSIONS (<field_name>, <field_name>, ...)
MEASURES (<field_name> (<aggregation_type), <field_name> (<aggregation_type), ...)
[PARTITION BY (<field_name>, <field_name>, ...)]
[LOCALSORT BY (<field_name>, <field_name>, ...)]
[ARROW CACHE]
Parameters
<dataset_path>
String
The path of the virtual or physical dataset that the new raw reflection will be based on.
<reflection_name>
String
The name to give to the new reflection.
DIMENSIONS (<field_name>, <field_name>, ...)
String
The fields to include as dimensions in the reflection.
MEASURES (<field_name> (<aggregation_type), <field_name> (<aggregation_type), ...)
String
The fields to include as measures in the reflection, and the type of aggregation to perform on them. The possible types are COUNT, MIN, MAX, SUM, and APPROXIMATE COUNT DISTINCT.
PARTITION BY (<field_name>, <field_name>, ...)
String
(Optional) The fields on which to partition the data horizontally in the reflection.
LOCALSORT BY (<field_name>, <field_name>, ...)
String
(Optional) The fields on which to sort the data that is in the reflection.
ARROW CACHE
(Optional) Specifies that you want Dremio to convert data from your reflection’s Parquet files to the Apache Arrow format when copying that data to executor nodes. Normally, Dremio copies data as-is from the Parquet files as-is to caches on executor nodes, which are nodes that carry out the query plans devised by the query optimizer. Enabling this option can improve query performance even more. However, data in the Apache Arrow format requires more space on the executor nodes than data in the default format. You can use this option with the following types of distributed data storage:
- Amazon Simple Cloud Storage (S3)
- S3-compatible object storage
- HDFS
- Microsoft Azure Data Lake Storage
- Microsoft Azure Storage
Example
ALTER TABLE Samples."samples.dremio.com"."zips.json"
CREATE AGGREGATE REFLECTION per_state
USING
DIMENSIONS (state)
MEASURES (city (COUNT))
LOCALSORT BY (state)