Update Data in a Table
Now, you’ll make an update to the
nyc_trips table you’ve been working with. DML capabilities are enabled by Apache Iceberg, a high-performance table format that solves challenges with traditional tables in data lakes.
Step 1. Create a Branch and Update a Table
Create a branch off of the
main branch to isolate any changes that you’ll be making to your
nyc_trips table. Similar to Git’s version control semantics for code, a branch in Arctic represents an independent line of development isolated from the main branch. Arctic does this with no data copy, and enables you to update, insert, and delete data in tables on new branches without impacting the original table in the main branch. When you are ready to update the main branch from changes in a secondary branch, you can merge them in Step #4.
To create a new branch, run the SQL below in the SQL Runner.Create a branch
CREATE BRANCH nyc_etl
AT BRANCH main
Now update the table by inserting a new row. Copy and paste both statements below and click Run.Update the table
USE BRANCH nyc_etl
INSERT INTO "catalog_name".my_folder.nyc_trips
VALUES ('2013-02-10 20:00:00.000',
To verify the insert, run the SQL below.Verify the update
AT BRANCH nyc_etl;
The row count is now 11, which verifies that the insert was successful.
To verify that theVerify that the main branch was not impacted
mainbranch wasn’t impacted by these changes and confirm that no users or dashboards querying against production data would have been impacted by the update made in the previous step, run the SQL below to query the same table on
AT BRANCH main;tip
Try making additional updates to the data in the
nyc_tripstable. See SQL Command > UPDATE.
Once you’re satisfied with the updates to the table, you can now merge these changes back to the
main branch, where all your users will see the new data.
Step 4. Merge your Changes into the Main Branch
Now that you’ve verified the updates you made in your branch, merge the changes back to the
To merge your changes to theMerge changes to the main branch
nyc_tripsdata back to
main, run the SQL below.
MERGE BRANCH nyc_etl
To verify the merge, run the SQL below.Verify the data was merged to main
AT BRANCH main;
You now see an additional row in the table on the
mainbranch! You were able to leverage the same data manipulation capabilities of a data warehouse on data from a data lake while preserving the open format of the data. You also updated the data and verified the changes in an isolated branch without having to make a copy of the data and without disrupting your users who rely on this table.