Skip to main content
Version: current [26.x]

Transforming Data on Load

Dremio allows data transformations during loading with the COPY INTO SQL command, simplifying the ETL process by enabling simple transformations as data is ingested.

Common use cases for this functionality include:

  • Selecting a subset of columns from source files to write into the target table.
  • Reordering columns in source files to match the order in the target table.
  • Converting data types from source files into target table type, which is most common with CSVs.
  • Applying simple elementwise functions, which would require additional processing later.

Supported File Formats

Dremio's COPY INTO SQL command with transformations supports Parquet and CSV file formats.

Select a Subset of Columns

Imagine we receive a file that contains the available inventory in each store every day; however, the file also includes item details that we don't need. We need to load only a subset of the data and exclude information that does not help build our data product. To achieve this, we run the following:

COPY INTO inventory( store_id, stock_date,  item_id, amount,)
FROM (SELECT t.$1, t.$5, t.$2, t.$6 FROM '@mystage/customer/.csv' t)
FILE_FORMAT 'csv';

Reorder Columns

Sometimes, the data in the underlying file does not match how you want to materialize the data in Dremio. In the example below, we have a CSV snapshot of our customer data, and the cust_id column is the fifth column in the file. As a best practice, we want to move the table's primary key to the first column of the table. To achieve this, we load the data with the following:

COPY INTO dim_customer(cust_id, name, address, age)
FROM (SELECT t.$5, t.$1, t.$2, t.$3 FROM '@mystage/customer/.csv' t)
FILE_FORMAT 'csv';

Convert Data Types

If you want to convert the source data types of your files into target types, you can add that to your SELECT statement as well.

COPY INTO dim_customer(cust_id, name, address, age)
FROM (SELECT CAST(t.$5 AS INTEGER), .$1, t.$2, t.$3 FROM '@mystage/customer/.csv' t)
FILE_FORMAT 'csv';

Supported Functions

Unsupported Capabilities

The SELECT statement used for transforming data in the COPY INTO SQL command has the following restrictions:

  • WHERE clause
  • ORDER BY statement
  • GROUP BY statement
  • JOIN
  • aggregate and window functions

Unsupported file types

The COPY INTO with transformations capability does not support JSON, AVRO, or ORC file types.

Feedback