On this page

    COPY INTO <table>

    You can load data from CSV or JSON files that are in a source into an existing Apache Iceberg table.

    important:

    For information about how the COPY INTO <table> operation works, as well as the supported storage locations and requirements, see Copying Data Into Apache Iceberg Tables.

    Syntax

    Syntax
    COPY INTO <table_name> 
      FROM @<storage_location_name>[/<path>[/<file_name>]]
      [ FILES ( '<file_name>' [ , ... ] ) |
        REGEX '<regex_pattern>' ] 
    [ FILE_FORMAT { 'CSV' | 'JSON' } ]
    [ [csv_format_options] | [json_format_options] ]
    
    csv_format_options
    { DATE_FORMAT '<string>' | TIME_FORMAT '<string>' | TIMESTAMP_FORMAT '<string>' | TRIM_SPACE [ '<boolean>' ] | NULL_IF ( '<string>' [, ...] ) | RECORD_DELIMITER '<character>' | FIELD_DELIMITER '<character>' | QUOTE_CHAR '<character>' | ESCAPE_CHAR '<escape_character>' | EMPTY_AS_NULL [ '<boolean>' ] } [, ...]
    
    json_format_options
    { DATE_FORMAT '<string>' | TIME_FORMAT '<string>' | TIMESTAMP_FORMAT '<string>' | TRIM_SPACE [ '<boolean>' ] | NULL_IF ( '<string>' [, ...] ) } [, ...]
    

    Parameters

    <table_name>

    String

    The name of the table, which can also include the necessary qualifier if the table is not in the current context. Example: Samples.'samples.dremio.com'.'NYC-taxi-trips'


    @<storage_location_name>[/<path>[/<filename>]]

    String

    Specifies the file system location to load files from. The location provided must be a preconfigured Dremio Source such that only the storage location name and file path need to be provided.

    • The location path can be a directory or a specific file name.
    • If a specific file is provided then the `COPY INTO` operation works only on that file.
    • If instead a directory path is provided without a file name then the operation operates on all files in the directory including subdirectories recursively.


    [ FILES ( '<file_name>' [ , ... ] ) | REGEX '<regex_pattern>' ]

    String

    Optional

    Specifies a list of files to be loaded in the `FILES` clause. The files must already have been staged in the storage location specified in the FROM clause. The statement will error when one or more file specified is not found or not accessible.
    <file_name> is of format [<directory_name>]/<file_name>.
    Alternatively, specifies a regular expression pattern string in the `REGEX` clause, enclosed in single quotes, specifying the file names, paths, or both to match. Path is matched relative to the storage location.

    Note:

    For best performance, avoid applying patterns that filter on a large number of files.

    You can use either `FILES` or `REGEX`, but not both.
    The maximum number of file names that can be specified is 1000.
    This clause is mutually exclusive with the [/<file_name>] option in the previous clause.


    [ FILE_FORMAT { 'CSV' | 'JSON' } ]

    String

    Optional

    The format of the file or files to copy data from.

    • If not specified, the file format is determined based upon the file extension, which means that the file extension is required if `FILE_FORMAT` is not specified.
    • To load files that do not have an extension, `FILE_FORMAT` must be specified.
    • All files loaded in a `COPY INTO` operation must be of the same file format.


    csv_format_options

    String

    Options that describe the formats and other characteristics of the source `CSV` file or files.


    json_format_options

    String

    Options that describe the formats and other characteristics of the source `JSON` file or files.

    CSV Format Options

    DATE_FORMAT '<string>'

    String

    Optional

    String that defines the format of date values in the data files to be loaded. If a value is not specified, YYYY-MM-DD is used. See the section 'Date/Time Formatting' in Date/Time for more format elements.


    TIME_FORMAT '<string>'

    String

    Optional

    String that defines the format of time values in the data files to be loaded. If a value is not specified, HH24:MI:SS.FFF is used. See the section 'Date/Time Formatting' in Date/Time for more format elements.


    TIMESTAMP_FORMAT '<string>'

    String

    Optional

    String that defines the format of timestamp values in the data files to be loaded. If a value is not specified, YYYY-MM-DD HH24:MI:SS.FFF is used. See the section 'Date/Time Formatting' in Date/Time for more format elements.


    TRIM_SPACE [ '<boolean>' ]

    String

    Optional

    Boolean that specifies whether or not to remove leading and trailing white space from strings. The default is FALSE.


    NULL_IF ( '<string>' [, ...] )

    String

    Optional

    Replace strings in the data load source with NULL.


    RECORD_DELIMITER '<character>'

    String

    Optional

    One or more characters that separate records in an input file. Accepts common escape sequences. If a value is not specified, \r\n is used.


    FIELD_DELIMITER '<character>'

    String

    Optional

    The single character used two separate fields in the file. If a value is not specified, , is used.


    QUOTE_CHAR '<character>'

    String

    Optional

    The single character used to quote field data within a record. The default is a double-quotation mark.


    ESCAPE_CHAR '<escape_character>'

    String

    Optional

    A single character used as the escape character for the character specified by QUOTE_CHAR. The escape character provides a way to include the QUOTE_CHAR character inside a string literal by modifying the meaning of that character in the string. If a value is not specified, " is used.


    EMPTY_AS_NULL [ '<boolean>' ]

    String

    Optional

    Boolean that specifies whether an empty string is considered a NULL field or an empty string. If a value is not specified, TRUE is used.

    JSON Format Options

    DATE_FORMAT '<string>'

    String

    Optional

    String that defines the format of date values in the data files to be loaded. If a value is not specified, YYYY-MM-DD is used. See the section 'Date/Time Formatting' in Date/Time for more format elements.


    TIME_FORMAT '<string>'

    String

    Optional

    String that defines the format of time values in the data files to be loaded. If a value is not specified, HH24:MI:SS.FFF is used. See the section 'Date/Time Formatting' in Date/Time for more format elements.


    TIMESTAMP_FORMAT '<string>'

    String

    Optional

    String that defines the format of timestamp values in the data files to be loaded. If a value is not specified, YYYY-MM-DD HH24:MI:SS.FFF is used. See the section 'Date/Time Formatting' in Date/Time for more format elements.


    TRIM_SPACE [ '<boolean>' ]

    String

    Optional

    Boolean that specifies whether or not to remove leading and trailing white space from strings. The default is FALSE.


    NULL_IF ( '<string>' [, ...] )

    String

    Optional

    Replace strings in the data load source with NULL.

    Type Coercion

    Source Data Type Target Dremio Data Type
    String BIGINT, BOOLEAN, DATE, DECIMAL, DOUBLE, FLOAT, INT, TIME, TIMESTAMP, VARCHAR
    Integer BIGINT, DECIMAL, DOUBLE, FLOAT, INT, VARCHAR
    Floating-point numbers DECIMAL, DOUBLE, FLOAT, VARCHAR
    Boolean BOOLEAN, VARCHAR
    Object (JSON only) STRUCT
    Array (JSON only) LIST

    note:

    Although the TIMESTAMP data type is supported as a target schema data type, TIMESTAMP with time zone is not supported.

    Output

    The command returns this output:

    Column name Description
    Rows Inserted The number of rows loaded from the source data files.

    Examples

    Copy all files present in a folder
    COPY INTO context.MyTable
      FROM '@SOURCE/bucket/path/folder/' 
      FILE FORMAT 'json'
    
    Copy a specific file in a directory
    COPY INTO context.myTable 
      FROM '@SOURCE/bucket/path/folder/file1.json'
    
    Copying files using a regular expression pattern string matching filename
    COPY INTO context.myTable 
      FROM '@SOURCE/bucket/path/folder' 
      REGEX '.*.csv'
    
    Copying files using a regular expression pattern string matching filename and path
    COPY INTO 'context.myTable' 
      FROM '@SOURCE/bucket/path/folder' 
      REGEX '^2020-11-1[2-3]/.*/' 
      FILE_FORMAT 'json'
    
    Copying files using a regular expression pattern string in the specified file format
    COPY INTO context.myTable 
      FROM '@SOURCE/bucket/path/folder' 
      REGEX 'part.*' 
      FILE_FORMAT 'json'
    
    Copying a list of files present on a Dremio source
    COPY INTO context.myTable 
      FROM '@SOURCE/bucket/path/folder'
      FILES ('foo.csv', 'dir/bar.csv')
    
    Copying files containing DATE and TIME types
    COPY INTO context.myTable 
      FROM '@SOURCE/bucket/path/folder' 
      FILES ('date_time_data_file1.csv', 'date_time_data_file2.csv') 
      (DATE_FORMAT 'DD-MM-YYYY', TIME_FORMAT 'HH24:MI:SS')
    
    Copying CSV files and replacing certain entries with NULL, if encountered
    COPY INTO context.myTable 
      FROM '@SOURCE/bucket/path/folder' 
      FILES ('file1.csv', 'file2.csv', 'file3.csv') 
      (NULL_IF ('None', 'NA'))
    
    Copying with the following transformations: Trimming any leading and trailing whitespaces for each entry, and treating empty values as NULL
    COPY INTO context.myTable 
      FROM '@SOURCE/bucket/path/folder' 
      (TRIM_SPACE 'true', EMPTY_AS_NULL 'true')
    
    Copying CSV files into a table with RECORD_DELIMITER and FIELD DELIMITER clauses
    COPY INTO context.myTable 
      FROM '@SOURCE/bucket/path/folder' 
      FILE_FORMAT 'csv' 
      (RECORD_DELIMITER '\n', FIELD_DELIMITER '\t')
    
    Copying a CSV file into a table with an ESCAPE_CHAR clause
    COPY INTO context.myTable
      FROM '@SOURCE/bucket/path/folder'
      FILES ('fileName.csv')
      (ESCAPE_CHAR '|')