On this page

    Creating Apache Iceberg Tables

    The CREATE TABLE command creates Apache Iceberg tables in Amazon Glue and Amazon S3 datasources.

    Prerequisites

    Before you attempt to create Iceberg tables, ensure that you are using an Amazon Glue or Amazon S3 datasource.

    Default Table Formats Used for New Tables

    Beginning with Dremio v22.0:

    • Amazon Glue datasources added to projects default to using the Apache Iceberg table format.

    • Amazon S3 datasources added to projects default to using the Apache Parquet table format. Follow these steps to ensure that the default table format for new tables is Apache Iceberg:

      1. In Dremio, click the Amazon S3 datasource.
      2. Click the gear icon in the top-right corner above the list of the datasource’s contents.
      3. On the Advanced Options page of the Edit Source dialog, select ICEBERG under Default CTAS Format.
      4. Click Save.

    Amazon Glue datasources added to projects before this date are modified by Dremio to default to the Apache Iceberg table format.

    Amazon S3 datasources added to projects before this date continue to use the Parquet table format for tables. For the SQL commands that you can use to create and query tables in such datasources, see Tables.

    Locations in which Iceberg Tables are Created

    Where the CREATE TABLE command creates a table depends on the type of datasource being used.

    note:

    Starting in Dremio 23.1.0, when creating a new table, the locationUri property is taken into consideration. This enables simpler CREATE TABLE and CREATE TABLE AS statements that don’t require an explicitly provided location, allows users to segregate data by creating multiple Hive databases that each specify the location where table data must be stored, and eliminates the need to explicitly specify the hive.metastore.warehouse.dir connection property when adding a Hive or Glue source.

    Location in Amazon Glue Datasources

    The root directory is assumed by default to be /user/hive/warehouse.

    If you want to create tables in a different location, you must specify the S3 address of an Amazon S3 bucket in which to create them:

    1. In Dremio, click the Amazon Glue datasource.
    2. Click the gear icon in the top-right corner above the list of the datasource’s contents.
    3. (Optional in Dremio 23.1.0) On the Advanced Options page of the Edit Source dialog, add this connection property: hive.metastore.warehouse.dir
    4. Set the value to the S3 address of an S3 bucket.

    To the root location are appended the schema path and table name to determine the default physical location for a new table. For example, this CREATE TABLE command creates the table table_A in the directory <rootdir>/database/schema/table_A

    CREATE TABLE database.schema.table_A
    

    Location in Amazon S3 Datasources

    The root physical location is the main root directory for the filesystem. From this location, the path and table name are appended to determine the physical location for a new table.

    For example, this CREATE TABLE command creates the table table_A in the directory rootdir/folder1/folder2/table_A:

    CREATE TABLE <Amazon_S3_datasource>.folder1.folder2.table_A
    

    Syntax

    Syntax
    CREATE TABLE [IF NOT EXISTS] <table_path>.<table_name>
    ( <column_spec [, ...]> )
    [ PARTITION BY ( [<column_name>|<partition_transform>] [ , ... ] ) ]
    [ LOCALSORT BY (<column_name>) ]
    

    Parameters

    <table_path>

    String

    The path in which the table is located.


    <table_name>

    String

    The name of the table that you want to query.


    IF NOT EXISTS

    String

    Optional

    Causes Dremio Cloud to run the statement only if a table with the same name in the same path does not already exist.


    <column_spec [, ...]>

    String

    Optional

    Defines one or more columns and their data types, as in this example:

    ( columnA FLOAT, columnB INT )
    The data types can be primitive or complex.
    These are the supported primitive datatypes:
    • BOOLEAN
    • VARBINARY
    • DATE
    • FLOAT
    • DECIMAL
    • DOUBLE
    • INTERVAL
    • INT
    • BIGINT
    • TIME
    • TIMESTAMP
    • VARCHAR (The length is always 65536 bytes. If a length is specified, it is ignored.)

    You can define complex types by using either of these two sets of syntax:
    Set 1
    • struct_type: ROW( name primitive_or_complex_type, .. )
    • list_type: ARRAY(primitive_or_complex_type)
    Examples:
    ROW(innerfield INT, anotherinnerfield DOUBLE)
    ARRAY(INT)
    ROW(innerfield ARRAY(INT))
    ARRAY(ROW(innerfield INT))
    Set 2
    • struct_type: STRUCT <name : primitive_or_complex_type, ... >
    • list_type: { LIST | ARRAY } < primitive_or_complex_type >
    Examples:
    STRUCT<innerfield : INT, anotherinnerfield : DOUBLE>
    LIST<INT>
    ARRAY<INT>
    STRUCT<innerfield : LIST<INT>>
    LIST<STRUCT<innerfield : INT>>
    You cannot use the INSERT SQL command to insert data into columns that use a complex data type.


    PARTITION BY ( [<column_name>|<partition_transform>] [ , ... ] )

    String

    Optional

    Partitions the table data on the values in a single column or by using one of these partition-transformation functions:

    TransformDescription
    identity( <col> )Explicitly specified identity transform
    year( <col> )Partition by year. The column must use the TIMESTAMP data type.
    month( <ts_col> )Partition by month. The column must use the TIMESTAMP data type.
    day( <ts_col> )Partition by day. The column must use the TIMESTAMP data type.
    hour( <ts_col> )Partition by hour. The column must use the TIMESTAMP data type.
    bucket( <count>, <col> )Partition by hashed value into <count> buckets
    truncate( <length>, <col> )Partition by truncated value.
    • Strings are truncated to the specified length.
    • Integer and biginteger values are truncated to bins.
      Example: truncate(10, i) produces 0, 10, 20, and so on.


    LOCALSORT BY (<column_name>)

    String

    Optional

    Sorts the records of the table by the values in the specified column.


    AS <query>

    String

    Optional

    Uses the records returned by a SELECT statement to define and populate the table.

    Examples

    Creating a table as SELECT * from another table
    create table myAmazonS3Source.myFolder.myTable as select * from myAmazonS3Source.anotherFolder.anotherTable
    
    Creating a table and partitioning it by month
    create table myTable (col1 int, col2 date) partition by (month(col2))