Update Data in a Table
Now, you’ll make an update to the nyc_trips
table you’ve been working with. DML capabilities are enabled by Apache Iceberg, a high-performance table format that solves challenges with traditional tables in data lakes.
Step 1. Create a Branch and Update a Table
Create a branch off of the main
branch to isolate any changes that you’ll be making to your nyc_trips
table. Similar to Git’s version control semantics for code, a branch in Arctic represents an independent line of development isolated from the main branch. Arctic does this with no data copy, and enables you to update, insert, and delete data in tables on new branches without impacting the original table in the main branch. When you are ready to update the main branch from changes in a secondary branch, you can merge them in Step #4.
-
To create a new branch, run the SQL below in the SQL Runner.
Create a branchCREATE BRANCH nyc_etl
AT BRANCH main
IN "catalog_name"; -
Now update the table by inserting a new row. Copy and paste the statement below and click Run.
Update the tableINSERT INTO "catalog_name".my_folder.nyc_trips
AT BRANCH nyc_etl
VALUES ('2013-02-10 20:00:00.000',
9,
3.00,
10.0,
3.0,
13.0); -
To verify the insert, run the SQL below.
SELECT COUNT(*)
FROM "catalog_name".my_folder.nyc_trips
AT BRANCH nyc_etl;
The row count is now 1,000,001, which verifies that the insert was successful.
-
To verify that the
Verify that the main branch was not impactedmain
branch wasn’t impacted by these changes and confirm that no users or dashboards querying against production data would have been impacted by the update made in the previous step, run the SQL below to query the same table onmain
. Notice that there's no row with the timestamp we added in the previous step.SELECT *
FROM "catalog_name".my_folder.nyc_trips
AT BRANCH main
WHERE pickup_datetime = '2013-02-10 20:00:00.000';
Try making additional updates to the data in the nyc_trips
table. See SQL Command > UPDATE.
Once you’re satisfied with the updates to the table, you can now merge these changes back to the main
branch, where all your users will see the new data.
Step 4. Merge your Changes into the Main Branch
Now that you’ve verified the updates you made in your branch, merge the changes back to the main
branch.
-
To merge your changes to the
Merge changes to the main branchnyc_trips
data back tomain
, run the SQL below.MERGE BRANCH nyc_etl
INTO main
IN "catalog_name"; -
To verify the merge, run the SQL below.
Verify the data was merged to mainSELECT *
FROM "catalog_name".my_folder.nyc_trips
AT BRANCH main
WHERE pickup_datetime = '2013-02-10 20:00:00.000';You now see an additional row in the table on the
main
branch! You were able to leverage the same data manipulation capabilities of a data warehouse on data from a data lake while preserving the open format of the data. You also updated the data and verified the changes in an isolated branch without having to make a copy of the data and without disrupting your users who rely on this table.