Skip to main content
Version: current [25.x]

CREATE TABLE AS

Create a new table as a select statement from another table.

note

Table names cannot include the following special characters: /, :, [, or ].

Syntax
CREATE TABLE [IF NOT EXISTS] <table_name>
[ HASH PARTITION BY ( { <column_name> | <partition_transform> } [ , ... ] ) ]
[ LOCALSORT BY ( <column_name> [ , ... ] ) ]
[ TBLPROPERTIES ( '<property_name>' = '<property_value>' [ , ... ] ) ]
AS <select_statement>

Parameters

IF NOT EXISTS Optional

Causes Dremio to run the statement only if a table with the same name in the same path does not already exist.


<table_name> String

The unique name of the table that you want to create. You can create any of the following:

  • A table in a filesystem source formatted as <source>.<table_name>.
  • A table in the scratch directory formatted as $scratch.<table_name>.
  • An Iceberg table formatted as <table_name> in an Amazon Glue data source, Amazon S3 data source, or external Nessie data source.
note
  • Tables in the scratch location and in filesystem sources are written in Parquet format.
  • No table is created in the filesystem source when running a CTAS against an empty table and storing the data in Parquet format ( STORE AS (type=>'parquet') ).

After a source table has been created, the Dremio console displays the following:

  • Path where the table was created.
  • Number of records in the new table.
  • File size of the table.
note
  • For additional security, access can be further controlled by selecting specifically restricted filesystem sub-directories when you initially create your Dremio source.
  • CTAS functionality is enabled on a per source basis. To use the CREATE TABLE AS (CTAS) command on a source, you must grant source privileges to the appropriate users.

HASH PARTITION BY ( { <column_name> | <partition_transform> } [ , ... ] ) String   Optional

When you designate column(s) using this clause, Dremio will store the rows containing the same hash partition values in their own Parquet files. Using this clause provides speed optimization for when your WHERE clause matches your PARTITION BY clause.

  • <column_name>: The unique name of the column. Multiple columns may be specified as long as the column names include their data type. Use commas to separate each column name and data-type pair. If a column is listed as a partition column, it cannot also be listed as a sort column for the same table. Each column specified must also be listed as a dimension column.

  • <partition_transform>: Applies only to Iceberg tables. Use one of these partition-transformation functions:

    TransformDescription
    identity( <col> )Explicitly specified identity transform.
    year( <col> )Partition by year. The column uses either the TIMESTAMP or DATE data type.
    month( <ts_col> )Partition by month. The column uses either the TIMESTAMP or DATE data type.
    day( <ts_col> )Partition by day. The column uses either the TIMESTAMP or DATE data type.
    hour( <ts_col> )Partition by hour. The column uses either the TIMESTAMP or DATE data type.
    bucket( <count>, <col> )Partition by hashed value into <count> buckets.
    truncate( <length>, <col> )Partition by truncated value.
    • Strings are truncated to the specified length.
    • Integer and biginteger values are truncated to bins.
      Example: truncate(10, i) produces 0, 10, 20, and so on.
    note
    • If you partition a table by a column that has more distinct values than Dremio can partition by, you will receive an error message. If you receive this error message, try partitioning your table by a different column.
    • If the table is not partitioned, you will receive an error when you attempt to refresh the metadata using the PARTITION BY clause.

LOCALSORT BY ( <column_name> [ , ... ] ) String   Optional

Sorts the records of the table by the values in the specified column.

The name of the column that you want to sort the table by. Sorts each Parquet file fragment by the designated column. If you are doing a range filter on your LOCALSORT column, your query will be faster because the data is already sorted.


TBLPROPERTIES ( '<property_name>' = '<property_value>' [ , ... ] )

Sets one or more table properties for controlling the behavior of the table. See Properties of Apache Iceberg Tables for a list of the supported properties.


AS <select_statement> String

Use the SELECT statement to populate the new table using data from an existing table or view.


Examples

Create a table as SELECT * from another table
CREATE TABLE myAmazonS3Source.myFolder.myTable
AS SELECT * FROM myAmazonS3Source.anotherFolder.anotherTable;
Create a table and partition it by month
CREATE TABLE myTable
HASH PARTITION BY (month(col2))
AS SELECT * FROM source.anotherFolder.anotherTable;
Create a $scratch table
CREATE TABLE $scratch.my_table
AS SELECT * FROM TPCH.lineitem;
Create a source table
CREATE TABLE s3.upload.lineitem2
AS SELECT * FROM TPCH.lineitem;