On this page

    Querying Apache Iceberg Tables

    The SELECT command queries table data and table metadata, and can query by timestamp and snapshot ID.

    note:

    Dremio supports only the copy-on-write storage mechanism and reads only the latest data files for each Iceberg v2 table that you run SQL commands against. Dremio does not support Iceberg v2 tables that have merge-on-read manifests.

    Querying a Table’s Data

    Syntax
    SELECT [ <column1_name> [ , <column2_name> ... ] | * ] 
    FROM <table_path>.<table_name>
    [WHERE where_condition]
    [constructs]
    

    Parameters

    <column1_name> [ , <column2_name> ... ]

    String

    The name of the column or columns that you want to query.


    *

    String

    Indicates that you want to query all columns in the table.


    <table_name>

    String

    The name of the table that you want to query.


    WHERE where_condition

    String

    The condition to use to query a subset of the rows in the table.


    contructs

    String

    The SELECT command supports many constructs, such as GROUP BY, ORDER BY, SORT, and more.

    Metadata Queries

    Iceberg includes helpful system-table references which provide an easy access to Iceberg-specific information on tables, including:

    • The data files for a table
    • The history of a table
    • The manifest files for a table
    • The snapshots for a table

    Querying a Table’s Data File Metadata

    Queries use the table_files() function.

    To run a SELECT command, the user must have Dremio’s SELECT privilege.

    Syntax
    SELECT * 
    FROM TABLE( table_files('<table_path>.<table_name>') )
    

    Dremio returns records that have these fields:

    Column Data Type Description
    file_path VARCHAR Full file path and name
    file_format VARCHAR Format, e.g. PARQUET
    record_count BIGINT Number of rows
    file_size_in_bytes BIGINT Size of file
    column_sizes VARCHAR List of columns with size of each column
    value_counts VARCHAR List of columns with number of records with a value
    null_value_counts VARCHAR List of columns with number of records as NULL
    nan_value_counts VARCHAR List of columns with number of records as NaN
    lower_bounds VARCHAR List of columns with lower bound of each
    upper_bounds VARCHAR List of columns with upper bound of each
    key_metadata VARCHAR Key metrics
    split_offsets VARCHAR Split offsets

    Parameters

    <table_name>

    String

    The name of the table that you want to query.

    Querying a Table’s History Metadata

    Queries use the table_history() function.

    To run a SELECT command, the user must have Dremio’s SELECT privilege.

    Syntax
    SELECT * 
    FROM TABLE( table_history('<table_path>.<table_name>') )
    

    Dremio returns records that have these fields:

    Column Data Type Description
    made_current_at TIMESTAMP The timestamp the Iceberg snapshot was made at
    snapshot_id VARCHAR The Iceberg snapshot ID
    parent_id VARCHAR The parent snapshot ID, null if not exists
    is_current_ancestor BOOLEAN If the snapshot is part of the current history, shows abandoned snapshots

    Example

    Example with table_history()
    SELECT * 
    FROM TABLE( table_history('myTable')) 
    WHERE snapshot_id = 4593468819579153853
    

    Querying a Table’s Manifest File Metadata

    Queries use the table_manifests() function.

    To run a SELECT command, the user must have Dremio’s SELECT privilege.

    Syntax
    SELECT * 
    FROM TABLE( table_manifests('<table_path>.<table_name>') )
    

    Dremio returns records that have these fields:

    Column Data Type Description
    path VARCHAR Full path and name of manifest file
    length BIGINT Size in bytes
    partition_spec_id VARCHAR
    added_snapshot_id VARCHAR ID of snapshot added to manifest
    added_data_files_count BIGINT Number of new data files added
    existing_data_files_count BIGINT Number of existing data files
    deleted_data_files_count BIGINT Number of files removed
    partition_summaries VARCHAR Partition information

    Querying a Table’s Snapshot Metadata

    Queries use the table_snapshot() function.

    To run a SELECT command, the user must have Dremio’s SELECT privilege.

    Syntax
    SELECT * 
    FROM TABLE( table_snapshot('<table_path>.<table_name>') )
    

    Dremio returns records that have these fields:

    Column Data Type Description
    committed_at TIMESTAMP The timestamp the Iceberg snapshot was committed
    snapshot_id VARCHAR The Iceberg snapshot ID
    parent_id VARCHAR The parent snapshot ID, null if not exists
    operation VARCHAR The Iceberg operation (e.g. append)
    manifest_list VARCHAR List of manifest files for the snapshot
    summary VARCHAR Additional attributes (records added, etc)

    Example

    Finds the number of snapshots for a table
    SELECT count(*) 
    FROM TABLE( table_snapshot('myTable')) 
    GROUP BY snapshot_id
    

    Time Travel Queries

    Iceberg tables support both TIMESTAMP-based references and Snapshot ID-based references to specify an earlier version of a table to read.

    Time Travel by Timestamps

    The query returns an error stating that the reference for this table was out of range if either of these two conditions is true:

    • The value of the timestamp is some time in the future, even by one second.
    • The value of the timestamp is older than the oldest valid snapshot for the table.

    To run a SELECT command, the user must have Dremio’s SELECT privilege.

    Syntax
    SELECT <column_name> 
    FROM <table_path>.<table_name> AT <timestamp>
    

    Parameters

    <column_name>

    String

    The name of the column that you want to query.


    <table_name>

    String

    The name of the table that you want to query.


    <timestamp>

    String

    Uses the most recent Iceberg snapshot as of the provided timestamp. <timestamp> may be any SQL expression that resolves to a single timestamp type value, for example: CAST( DATE_SUB(CURRENT_DATE,1) AS TIMESTAMP ) or TIMESTAMP '2022-07-01 01:30:00.000'.

    Example

    Time travel query on an Iceberg table using a timestamp
    SELECT count(*) 
    FROM my_table AT TIMESTAMP '2022-07-01 01:30:00.000'
    

    Time Travel by Snapshot ID

    To run a SELECT command, the user must have Dremio’s SELECT privilege.

    Syntax
    SELECT <column_name> 
    FROM <table_path>.<table_name> AT SNAPSHOT '<snapshot-id>'
    

    Parameters

    <column_name>

    String

    The name of the column that you want to query.


    <table_name>

    String

    The name of the table that you want to query.


    <snapshot_id>

    String

    A snapshot ID obtained either through the table_history() or table_snapshot() metadata function.

    Example

    Time travel query on an Iceberg table using a snapshot ID
    SELECT *
    FROM myTable AT SNAPSHOT '5393090506354317772'