Skip to main content

Update Data in a Table

Now, you’ll make an update to the nyc_trips table you’ve been working with. DML capabilities are enabled by Apache Iceberg, a high-performance table format that solves challenges with traditional tables in data lakes.

Step 1. Create a Branch and Update a Table

Create a branch off of the main branch to isolate any changes that you’ll be making to your nyc_trips table. Similar to Git’s version control semantics for code, a branch in Arctic represents an independent line of development isolated from the main branch. Arctic does this with no data copy, and enables you to update, insert, and delete data in tables on new branches without impacting the original table in the main branch. When you are ready to update the main branch from changes in a secondary branch, you can merge them in Step #4.

  1. To create a new branch, run the SQL below in the SQL Runner.

    Create a branch
    CREATE BRANCH nyc_etl
    AT BRANCH main
    IN "catalog_name";
  2. Now update the table by inserting a new row. Copy and paste the statement below and click Run.

    Update the table
    INSERT INTO "catalog_name".my_folder.nyc_trips
    AT BRANCH nyc_etl
    VALUES ('2013-02-10 20:00:00.000',
  3. To verify the insert, run the SQL below.

    Verify the update
    FROM "catalog_name".my_folder.nyc_trips
    AT BRANCH nyc_etl;

    The row count is now 1,000,001, which verifies that the insert was successful.

  4. To verify that the main branch wasn’t impacted by these changes and confirm that no users or dashboards querying against production data would have been impacted by the update made in the previous step, run the SQL below to query the same table on main. Notice that there's no row with the timestamp we added in the previous step.

    Verify that the main branch was not impacted
    SELECT *
    FROM "catalog_name".my_folder.nyc_trips
    AT BRANCH main
    WHERE pickup_datetime = '2013-02-10 20:00:00.000';

    Try making additional updates to the data in the nyc_trips table. See SQL Command > UPDATE.

Once you’re satisfied with the updates to the table, you can now merge these changes back to the main branch, where all your users will see the new data.

Step 4. Merge your Changes into the Main Branch

Now that you’ve verified the updates you made in your branch, merge the changes back to the main branch.

  1. To merge your changes to the nyc_trips data back to main, run the SQL below.

    Merge changes to the main branch
    MERGE BRANCH nyc_etl
    INTO main
    IN "catalog_name";
  2. To verify the merge, run the SQL below.

    Verify the data was merged to main
    SELECT *
    FROM "catalog_name".my_folder.nyc_trips
    AT BRANCH main
    WHERE pickup_datetime = '2013-02-10 20:00:00.000';

    You now see an additional row in the table on the main branch! You were able to leverage the same data manipulation capabilities of a data warehouse on data from a data lake while preserving the open format of the data. You also updated the data and verified the changes in an isolated branch without having to make a copy of the data and without disrupting your users who rely on this table.

Next: Manage Engines & Workloads