On this page

    Reflections

    A reflection is an optimized materialization of source data or a query, similar to a materialized view, that is derived from an existing table or view. For more information about reflections, see Accelerating Queries with Reflections.

    Creating Raw Reflections

    Syntax
    ALTER DATASET <dataset_path> 
    CREATE RAW REFLECTION <reflection_name> 
    USING
    DISPLAY (
      <field_name>,
      <field_name>,
      ...
    )
    [PARTITION BY (<field_name>, <field_name>, ...)]
    [LOCALSORT BY (<field_name>, <field_name>, ...)]
    [ARROW CACHE]
    

    Parameters

    <dataset_path>

    String

    The path of the virtual or physical dataset that the new raw reflection will be based on.


    <reflection_name>

    String

    The name to give to the new reflection.


    DISPLAY (<field_name>, <field_name>, ...)

    String

    The fields to include in the reflection.


    PARTITION BY (<field_name>, <field_name>, ...)

    String

    Optional

    The fields on which to partition the data horizontally in the reflection.


    LOCALSORT BY (<field_name>, <field_name>, ...)

    String

    Optional

    he fields on which to sort the data that is in the reflection.


    ARROW CACHE

    Optional

    Specifies that you want Dremio to convert data from your reflection’s Parquet files to the Apache Arrow format when copying that data to executor nodes. Normally, Dremio copies data as-is from the Parquet files as-is to caches on executor nodes, which are nodes that carry out the query plans devised by the query optimizer. Enabling this option can improve query performance even more. However, data in the Apache Arrow format requires more space on the executor nodes than data in the default format. You can use this option with the following types of distributed data storage:

    • Amazon Simple Cloud Storage (S3)
    • S3-compatible object storage
    • HDFS
    • Microsoft Azure Data Lake Storage
    • Microsoft Azure Storage
    .


    Example

    Create a raw reflection that sorts customers by last name and partitions them by country
    ALTER DATASET "@user1"."customers"
    CREATE RAW REFLECTION customers_by_country
    USING
    DISPLAY (
      id,
      lastName,
      firstName,
      address,
      country
    )
    PARTITION BY (country)
    LOCALSORT BY (lastName)
    

    Creating Aggregation Reflections

    Syntax
    ALTER TABLE <dataset_path> 
    CREATE AGGREGATE REFLECTION <reflection_name> 
    USING 
    DIMENSIONS (<field_name>, <field_name>, ...) 
    MEASURES (<field_name> (<aggregation_type), <field_name> (<aggregation_type), ...) 
    [PARTITION BY (<field_name>, <field_name>, ...)]
    [LOCALSORT BY (<field_name>, <field_name>, ...)]
    [ARROW CACHE]
    

    Parameters

    <dataset_path>

    String

    The path of the virtual or physical dataset that the new raw reflection will be based on.


    <reflection_name>

    String

    The name to give to the new reflection.


    DIMENSIONS (<field_name>, <field_name>, ...)

    String

    The fields to include as dimensions in the reflection.


    MEASURES (<field_name> (<aggregation_type), <field_name> (<aggregation_type), ...)

    String

    The fields to include as measures in the reflection, and the type of aggregation to perform on them. The possible types are COUNT, MIN, MAX, SUM, and APPROXIMATE COUNT DISTINCT.


    PARTITION BY (<field_name>, <field_name>, ...)

    String

    Optional

    The fields on which to partition the data horizontally in the reflection.


    LOCALSORT BY (<field_name>, <field_name>, ...)

    String

    Optional

    The fields on which to sort the data that is in the reflection.


    ARROW CACHE

    Optional

    Specifies that you want Dremio to convert data from your reflection’s Parquet files to the Apache Arrow format when copying that data to executor nodes. Normally, Dremio copies data as-is from the Parquet files as-is to caches on executor nodes, which are nodes that carry out the query plans devised by the query optimizer. Enabling this option can improve query performance even more. However, data in the Apache Arrow format requires more space on the executor nodes than data in the default format. You can use this option with the following types of distributed data storage:

    • Amazon Simple Cloud Storage (S3)
    • S3-compatible object storage
    • HDFS
    • Microsoft Azure Data Lake Storage
    • Microsoft Azure Storage
    .


    Example

    Create an aggregation reflection that counts the number of cities per state in which a company has a franchise and sorts the result by state
    ALTER TABLE Samples."samples.dremio.com"."zips.json" 
    CREATE AGGREGATE REFLECTION per_state 
    USING 
    DIMENSIONS (state) 
    MEASURES (city (COUNT)) 
    LOCALSORT BY (state)
    

    Creating External Reflections

    Syntax
    ALTER DATASET <source_dataset_path> 
    CREATE EXTERNAL REFLECTION <reflection_name> 
    USING <target_dataset_path>
    

    Parameters

    source_dataset_path

    String

    The path of the view on which you are basing the external reflection.


    reflection_name

    String

    The name to give to the external reflection.


    target_dataset_path

    String

    The path of the derived table.