CREATE
The CREATE TABLE command creates Apache Iceberg tables in Amazon Glue data sources, Amazon S3 data sources, or external Nessie data sources.
Prerequisites
Before you attempt to create Iceberg tables, ensure that you are using an Amazon Glue, Amazon S3, or external Nessie data source.
Default Table Formats Used for New Tables
Beginning with Dremio v22.0:
-
Amazon Glue data sources added to projects default to using the Apache Iceberg table format.
-
Amazon S3 data sources added to projects default to using the Apache Parquet table format. Follow these steps to ensure that the default table format for new tables is Apache Iceberg:
- In Dremio, click the Amazon S3 data source.
- Click the gear icon in the top-right corner above the list of the data source's contents.
- On the Advanced Options page of the Edit Source dialog, select ICEBERG under Default CTAS Format.
- Click Save.
Amazon Glue data sources added to projects before this date are modified by Dremio to default to the Apache Iceberg table format.
Amazon S3 data sources added to projects before this date continue to use the Parquet table format for tables. For the SQL commands that you can use to create and query tables in such data sources, see Tables.
Locations in which Iceberg Tables are Created
Where the CREATE TABLE command creates a table depends on the type of data source being used.
Location in Amazon Glue Data Sources
The root directory is assumed by default to be /user/hive/warehouse
.
If you want to create tables in a different location, you must specify the S3 address of an Amazon S3 bucket in which to create them:
- In Dremio, click the Amazon Glue data source.
- Click the gear icon in the top-right corner above the list of the data source's contents.
- On the Advanced Options page of the Edit Source dialog, add this connection property:
hive.metastore.warehouse.dir
- Set the value to the S3 address of an S3 bucket.
To the root location are appended the schema path and table name to determine the default physical location for a new table. For example, this CREATE TABLE
command creates the table table_A
in the directory <rootdir>/database/schema/table_A
CREATE TABLE database.schema.table_A
Location in Amazon S3 Data Sources
The root physical location is the main root directory for the filesystem. From this location, the path and table name are appended to determine the physical location for a new table.
For example, this CREATE TABLE
command creates the table table_A
in the directory rootdir/folder1/folder2/table_A
:
CREATE TABLE <Amazon_S3_data_source>.folder1.folder2.table_A
Location in Nessie Data Sources
Top-level Nessie schemas have a configurable physical storage. This is used as the default root physical location.
In the project store each top level Nessie schema has its own directory path. So for example in the project’s Nessie the top level schema marketing
would be located in project_store/marketing
and this directory would be used by default as the root physical location. From there, the same schema.table resolution as described for Hive above would apply.
Syntax
CREATE TABLE syntaxCREATE TABLE [IF NOT EXISTS] <table_path>.<table_name>
( <column_spec [, ...]> )
[ PARTITION BY ( [<column_name>|<partition_transform>] [ , ... ] ) ]
[ LOCALSORT BY (<column_name>) ]
[ TBLPROPERTIES ('property_name' = 'property_value' [, ...] ) ]
CREATE TABLE <source_name>.<path>.<table_name>
[ PARTITION BY ( <partition_spec> [ , ... ] ) ]
[ LOCALSORT BY ( <column_name> [ , ... ] ) ]
[ TBLPROPERTIES ('property_name' = 'property_value' [, ...] ) ]
AS <QUERY>
Parameters
<table_name> String
The name of the table that you want to query.
[ IF NOT EXISTS ] String Optional
Causes Dremio Cloud to run the statement only if a table with the same name in the same path does not already exist.
<column_spec [, ...]> String Optional
Defines one or more columns and their data types, as in this example:
( columnA FLOAT, columnB INT )
The data types can be primitive or complex.
These are the supported primitive datatypes:
- BOOLEAN
- VARBINARY
- DATE
- FLOAT
- DECIMAL
- DOUBLE
- INTERVAL
- INT
- BIGINT
- TIME
- TIMESTAMP
- VARCHAR (The length is always 65536 bytes. If a length is specified, it is ignored.)
You can define complex types by using either of these two sets of syntax:
Set 1
- struct_type:
ROW( name primitive_or_complex_type, .. )
- list_type:
ARRAY(primitive_or_complex_type)
Examples:
ROW(innerfield INT, anotherinnerfield DOUBLE)
ARRAY(INT)
ROW(innerfield ARRAY(INT))
ARRAY(ROW(innerfield INT))
Set 2
- struct_type:
STRUCT <name : primitive_or_complex_type, ... >
- list_type:
{ LIST | ARRAY } < primitive_or_complex_type >
Examples:
STRUCT<innerfield : INT, anotherinnerfield : DOUBLE>
LIST<INT>
ARRAY<INT>
STRUCT<innerfield : LIST<INT>>
LIST<STRUCT<innerfield : INT>>
You cannot use the INSERT SQL
command to insert data into columns that use a complex data type.
PARTITION BY ( [ <column_name>| <partition_transform> ] [ , ... ] ) String Optional
Partitions the table data on the values in a single column or by using one of these partition-transformation functions:
Transform | Description |
---|---|
identity( <col> ) | Explicitly specified identity transform |
year( <col> ) | Partition by year. The column uses either the TIMESTAMP or DATE data type. |
month( <ts_col> ) | Partition by month. The column uses either the TIMESTAMP or DATE data type. |
day( <ts_col> ) | Partition by day. The column uses either the TIMESTAMP or DATE data type. |
hour( <ts_col> ) | Partition by hour. The column uses either the TIMESTAMP or DATE data type. |
bucket( <count>, <col> ) | Partition by hashed value into <count> buckets |
truncate( <length>, <col> ) | Partition by truncated value.
|
LOCALSORT BY (<column_name>) String Optional
Sorts the records of the table by the values in the specified column.
TBLPROPERTIES ('property_name' = 'property_value' [, ...] )
Sets one or more table properties for controlling the behavior of the table. See Properties of Apache Iceberg Tables for a list of the supported properties.
AS <query> String Optional
Uses the records returned by a SELECT statement to define and populate the table.
Examples
Creating a table as SELECT * from another tableCREATE TABLE myAmazonS3Source.myFolder.myTable AS SELECT * FROM myAmazonS3Source.anotherFolder.anotherTable
CREATE TABLE myTable (col1 int, col2 date) PARTITION BY (month(col2))