Skip to main content
Version: current [25.x]

CREATE TABLE

Create a new table.

note

Table names cannot include the following special characters: /, :, [, or ].

Syntax
CREATE TABLE [IF NOT EXISTS] <table_name>
( <column_name> <data_type> [ , ... ] )
[ MASKING POLICY <function_name> ( <column_name> [ , ... ] ) ]
[ ROW ACCESS POLICY <function_name> ( <column_name> [ , ... ] ) ]
[ PARTITION BY ( { <column_name> | <partition_transform> } [ , ... ] ) ]
[ LOCALSORT BY ( <column_name> [ , ... ] ) ]
[ TBLPROPERTIES ('<property_name>' = '<property_value>' [ , ... ] ) ]

Parameters

IF NOT EXISTS Optional

Causes Dremio to run the statement only if a table with the same name in the same path does not already exist.


<table_name> String

The unique name of the table that you want to create. You can create any of the following:

  • A table in a filesystem source formatted as <source>.<table_name>.
  • A table in the scratch directory formatted as $scratch.<table_name>.
  • An Iceberg table formatted as <table_name> in an Amazon Glue data source, Amazon S3 data source, or external Nessie data source.
note
  • Tables in the scratch location and in filesystem sources are written in Parquet format.
  • No table is created in the filesystem source when running a CTAS against an empty table and storing the data in Parquet format ( STORE AS (type=>'parquet') ).

After a source table has been created, the Dremio console displays the following:

  • Path where the table was created.
  • Number of records in the new table.
  • File size of the table.
note
  • For additional security, access can be further controlled by selecting specifically restricted filesystem sub-directories when you initially create your Dremio source.

( <column_name> <data_type> [ , ... ] ) String

Creates one or more columns that have the specified names, data types, and character limits. The size is an attribute of the data type.

These are the supported primitive types:

  • BOOLEAN
  • VARBINARY
  • DATE
  • FLOAT
  • DECIMAL
  • DOUBLE
  • INTERVAL
  • INT
  • BIGINT
  • TIME
  • TIMESTAMP
  • VARCHAR (The length is always 65536 bytes. If a length is specified, it is ignored.)

You can define complex types by using either of these two sets of syntax:

Set 1

  • struct_type: ROW( name primitive_or_complex_type, .. )
  • list_type: ARRAY(primitive_or_complex_type)

Examples:

ROW(innerfield INT, anotherinnerfield DOUBLE)
ARRAY(INT)
ROW(innerfield ARRAY(INT))
ARRAY(ROW(innerfield INT))

Set 2

  • struct_type: STRUCT <name : primitive_or_complex_type, ... >
  • list_type: { LIST | ARRAY } < primitive_or_complex_type >

Examples:

STRUCT<innerfield : INT, anotherinnerfield : DOUBLE>
LIST<INT>
ARRAY<INT>
STRUCT<innerfield : LIST<INT>>
LIST<STRUCT<innerfield : INT>>

You cannot use the INSERT command to insert data into columns that use a complex data type.


MASKING POLICY <function_name> ( <column_name> [ , ... ] ) String   Optional

Sets a policy for masking data. The UDF serving as the masking policy must accept and return the same data type as the column it is masking. If a UDF with the given name does not exist, then the affected table will not be reachable until the policy is dropped or a UDF created.


ROW ACCESS POLICY <function_name> ( <column_name> [ , ... ] ) String   Optional

Adds a row-access policy to the table. The UDF serving as the row-access policy must return BOOLEAN. If the UDF specified in the policy does not exist, then Dremio gives an error when attempting to query the table.


PARTITION BY ( { <column_name> | <partition_transform> } [ , ... ] ) String   Optional

The columns or partition-transformation functions on which to partition the table data.

  • <column_name>: The unique name of the column. Multiple columns may be specified as long as the column names include their data type. Use commas to separate each column name and data-type pair. If a column is listed as a partition column, it cannot also be listed as a sort column for the same table. Each column specified must also be listed as a dimension column.

  • <partition_transform>: Applies only to Iceberg tables. Use one of these partition-transformation functions:

    TransformDescription
    identity( <col> )Explicitly specified identity transform.
    year( <col> )Partition by year. The column uses either the TIMESTAMP or DATE data type.
    month( <ts_col> )Partition by month. The column uses either the TIMESTAMP or DATE data type.
    day( <ts_col> )Partition by day. The column uses either the TIMESTAMP or DATE data type.
    hour( <ts_col> )Partition by hour. The column uses either the TIMESTAMP or DATE data type.
    bucket( <count>, <col> )Partition by hashed value into <count> buckets.
    truncate( <length>, <col> )Partition by truncated value.
    • Strings are truncated to the specified length.
    • Integer and biginteger values are truncated to bins.
      Example: truncate(10, i) produces 0, 10, 20, and so on.

LOCALSORT BY ( <column_name> [ , ... ] ) String   Optional

Sorts the records of the table by the values in the specified column.

The name of the column that you want to sort the table by. Sorts each Parquet file fragment by the designated column. If you are doing a range filter on your LOCALSORT column, your query will be faster because the data is already sorted.


TBLPROPERTIES ( '<property_name>' = '<property_value>' [ , ... ] )

Sets one or more table properties for controlling the behavior of the table. Applies only to Iceberg tables. See Properties of Apache Iceberg Tables for a list of the supported properties.

Examples

Create an Iceberg table in an Amazon S3 data source
CREATE TABLE <Amazon_S3_data_source>.folder1.folder2.table_A;
Create an Iceberg table in an Amazon Glue Data source
CREATE TABLE database.schema.table_A;
Create an Iceberg table and partition it by month
CREATE TABLE myTable
(col1 int, col2 date)
PARTITION BY (month(col2));