Version: 24.3.x

SELECT

The SELECT command queries table data and table metadata, and can query by timestamp and snapshot ID.

note

Dremio supports only the copy-on-write storage mechanism and reads only the latest data files for each Iceberg v2 table that you run SQL commands against. Dremio does not support Iceberg v2 tables that have merge-on-read manifests.

Querying a Table's Data

Syntax

SELECT [ <column1_name> [ , <column2_name> ... ] | * ]
FROM <table_name>
[WHERE where_condition]
[constructs]

Parameters

<column1_name> [ , <column2_name> ... ] String

The name of the column or columns that you want to query.

Indicates that you want to query all columns in the table.

<table_name> String

The name of the table that you want to query.

WHERE where_condition String

The condition to use to query a subset of the rows in the table.

contructs String

The SELECT command supports many constructs, such as GROUP BY, ORDER BY, SORT, and more.

Metadata Queries

Iceberg includes helpful system-table references which provide an easy access to Iceberg-specific information on tables, including:

The data files for a table
The history of a table
The manifest files for a table
The partitions of a table
The snapshots for a table
Information about records from CSV or JSON files that were in error and not loaded into a table by a COPY INTO operation for which ON ERROR was set to 'continue'

Querying a Table's Data File Metadata

Queries use the table_files() function.

Syntax

SELECT *
FROM TABLE( table_files('<table_name>') )

Dremio returns records that have these fields:

Column	Data Type	Description
file_path	VARCHAR	Full file path and name
file_format	VARCHAR	Format, e.g. PARQUET
record_count	BIGINT	Number of rows
file_size_in_bytes	BIGINT	Size of file
column_sizes	VARCHAR	List of columns with size of each column
value_counts	VARCHAR	List of columns with number of records with a value
null_value_counts	VARCHAR	List of columns with number of records as NULL
nan_value_counts	VARCHAR	List of columns with number of records as NaN
lower_bounds	VARCHAR	List of columns with lower bound of each
upper_bounds	VARCHAR	List of columns with upper bound of each
key_metadata	VARCHAR	Key metrics
split_offsets	VARCHAR	Split offsets

Parameters

<table_name> String

The name of the table that you want to query.

Querying a Table's History Metadata

Queries use the table_history() function.

Syntax

SELECT *
FROM TABLE( table_history('<table_name>') )

Dremio returns records that have these fields:

Column	Data Type	Description
made_current_at	TIMESTAMP	The timestamp the Iceberg snapshot was made at
snapshot_id	VARCHAR	The Iceberg snapshot ID
parent_id	VARCHAR	The parent snapshot ID, null if not exists
is_current_ancestor	BOOLEAN	If the snapshot is part of the current history, shows abandoned snapshots

Example

Example with table_history()

SELECT *
FROM TABLE( table_history('myTable'))
WHERE snapshot_id = 4593468819579153853

Querying a Table's Manifest File Metadata

Queries use the table_manifests() function.

Syntax

SELECT *
FROM TABLE( table_manifests('<table_name>') )

Dremio returns records that have these fields:

Column	Data Type	Description
path	VARCHAR	Full path and name of manifest file
length	BIGINT	Size in bytes
partition_spec_id	VARCHAR
added_snapshot_id	VARCHAR	ID of snapshot added to manifest
added_data_files_count	BIGINT	Number of new data files added
existing_data_files_count	BIGINT	Number of existing data files
deleted_data_files_count	BIGINT	Number of files removed
partition_summaries	VARCHAR	Partition information

Querying a Table's Partition Metadata

Queries use the table_partitions() function to return partition-related statistics.

Syntax

SELECT *
FROM TABLE( table_partitions('<table_name>') )

Dremio returns records that have these fields:

Column Name	Data Type	Description
partition	CHARACTER VARYING	The partition key
record_count	INTEGER	The number of records in the partition
file_count	INTEGER	The number of data files in the partition
spec_id	INTEGER	The ID of the partition specification on which partition is based.

Querying a Table's Snapshot Metadata

Queries use the table_snapshot() function.

Syntax

SELECT *
FROM TABLE( table_snapshot('<table_name>') )

Dremio returns records that have these fields:

Column	Data Type	Description
committed_at	TIMESTAMP	The timestamp the Iceberg snapshot was committed
snapshot_id	VARCHAR	The Iceberg snapshot ID
parent_id	VARCHAR	The parent snapshot ID, null if not exists
operation	VARCHAR	The Iceberg operation (e.g. append)
manifest_list	VARCHAR	List of manifest files for the snapshot
summary	VARCHAR	Additional attributes (records added, etc)

Example

Finds the number of snapshots for a table

SELECT count(*)
FROM TABLE( table_snapshot('myTable'))
GROUP BY snapshot_id

Querying information about rejected records in CSV or JSON files used by a COPY INTO operation for which ON_ERROR was set to 'continue'

Queries use the copy_errors() function.

Syntax

SELECT * 
FROM TABLE(copy_errors(‘<table_name>’, [‘<query_id>’])

<table_name>
The name of the target table on which the COPY INTO operation was performed.
<query_id>
Optional parameter. The ID of the job that ran the COPY INTO operation. You can obtain this ID from the SYS.COPY_ERRORS_HISTORY system table. If you do not specify an ID, the default value is the ID of the last job started by the current user to run COPY INTO on the target table.

The records returned consist of these fields:

Column	Data Type	Description
job_id	string	The ID of the job that ran the COPY INTO operation.
file_name	string	The full path of the file where the validation error was encountered.
line_number	long	The number of the line (physical position) in the file where the error was encountered.
row_number	long	The row (record) number in input file.
column_name	string	The name of the column where the error was encountered.
error	string	A message describing the error.

note

This function reconstructs the original COPY INTO command and executes it in validation mode. No records are inserted into the target table. The maximum number of input files that it uses is determined by the dremio.copy.into.errors_max_input_files support key, which has a default value of 100. The selection of input files is based on alphabetical sorting and might represent a subset of the original list of input files that contain rejected records.

Time Travel Queries

Iceberg tables support both TIMESTAMP-based references and Snapshot ID-based references to specify an earlier version of a table to read.

Time Travel by Timestamps

The query returns an error stating that the reference for this table was out of range if either of these two conditions is true:

The value of the timestamp is some time in the future, even by one second.
The value of the timestamp is older than the oldest valid snapshot for the table.

To run a SELECT command, the user must have Dremio's SELECT privilege.

Syntax

SELECT <column_name>
FROM <table_name> AT <timestamp>

Parameters

<column_name> String

The name of the column that you want to query.

<table_name> String

The name of the table that you want to query.

<timestamp> String

Uses the most recent Iceberg snapshot as of the provided timestamp. <timestamp> may be any SQL expression that resolves to a single timestamp type value, for example: CAST( DATE_SUB(CURRENT_DATE,1) AS TIMESTAMP ) or TIMESTAMP '2022-07-01 01:30:00.000'.

Example

Time travel query on an Iceberg table using a timestamp

SELECT count(*)
FROM my_table AT TIMESTAMP '2022-07-01 01:30:00.000'

Time Travel by Snapshot ID

To run a SELECT command, the user must have Dremio's SELECT privilege.

Syntax

SELECT <column_name>
FROM <table_name> AT SNAPSHOT '<snapshot-id>'

Parameters

<column_name> String

The name of the column that you want to query.

<table_name> String

The name of the table that you want to query.

<snapshot_id> String

A snapshot ID obtained either through the table_history() or table_snapshot() metadata function.

Example

Time travel query on an Iceberg table using a snapshot ID

SELECT *
FROM myTable AT SNAPSHOT '5393090506354317772'

Querying a Table's Data​

Parameters​

Metadata Queries​

Querying a Table's Data File Metadata​

Parameters​

Querying a Table's History Metadata​

Example​

Querying a Table's Manifest File Metadata​

Querying a Table's Partition Metadata​

Querying a Table's Snapshot Metadata​

Example​

Querying information about rejected records in CSV or JSON files used by a COPY INTO operation for which ON_ERROR was set to 'continue'​

Time Travel Queries​

Time Travel by Timestamps​

Parameters​

Example​

Time Travel by Snapshot ID​

Parameters​

Example​

Querying a Table's Data

Parameters

Metadata Queries

Querying a Table's Data File Metadata

Parameters

Querying a Table's History Metadata

Example

Querying a Table's Manifest File Metadata

Querying a Table's Partition Metadata

Querying a Table's Snapshot Metadata

Example

Querying information about rejected records in CSV or JSON files used by a COPY INTO operation for which ON_ERROR was set to 'continue'

Time Travel Queries

Time Travel by Timestamps

Parameters

Example

Time Travel by Snapshot ID

Parameters

Example