Using Reflections in Nessie Source Branches
You can use the branching capabilities in a Nessie source as part of your reflection management workflow. For example, you can create branches to experiment with different reflection types and patterns, and see how each reflection affects your workload. Once you’re happy, you can roll out these changes in your production branch to help end users accelerate their workloads.
-
When you create a branch, reflections that are in the base branch are not copied into the new branch. In the new branch, you must create any reflections that you want to work with, as well as set their refresh and expiration policies.
-
When you merge a branch, reflections that are in that branch are not copied into the base branch. If you want to use the reflections that were in your branch, you must recreate them in the base branch.
-
Reflections that are in one branch cannot be used to accelerate queries run against another branch. For example, suppose table T1 exists in the main branch and we create branch test. A reflection is defined on the table T1 in test. However, no reflection is defined on the table T1 in the main branch. A query runs against the table T1 in the main branch. Even if the reflection in branch
test
could match and accelerate the query, the reflection will not accelerate a query that is run in the context of branchmain
. A reflection in a particular branch is used only to accelerate queries in that branch.
Example of a Reflection in a Main Branch
You can create reflections in the default branch of a Nessie source to accelerate queries on tables and views in that branch.
In this example, table T1
is created in the main
branch of a catalog, and reflection R1
is created on that table. Data is inserted into the table, and the reflection is subsequently refreshed (either according to a schedule or manually), so that it includes the inserted data.
USE BRANCH main;
SELECT * FROM T1;
Example of a Reflection in a Branch Used for Development
In this scenario, you create a branch so that you can manipulate tables outside of the default branch. No reflections that are in the default branch are copied into the new branch. You must create any reflections that you want to use in the new branch. If you merge your branch back into the default branch or into a different branch, none of the reflections in it are copied into the branch merged into. Deleting your branch also deletes the reflections that are in that branch.
In this example, table T1
is created in the main
branch, and reflection R1
is created on that table. Then, branch B1
is created for testing changes to the structure of the table. When B1
is created, table T1
is copied over from the main
branch, but R1
is not.
In branch B1
, table T1
is altered by the addition of a column. Data is then inserted into the table. Reflection R2
is defined on the table to accelerate subsequent queries on it.
In main
branch in the meantime, data is inserted into the table and the reflection is refreshed, so that it includes the new data. Queries on the table in the main
branch continue to use the reflection defined on it earlier, if the reflection is a good match.
Eventually, the changes to the table T1
in the branch B1
are merged into the main
branch. The reflection R2
in the branch B1
is not merged into the main
branch, too. However, in the main
branch, a reflection that is identical to R2
is created. This new reflection is R3
. Queries on T1
in the main
branch can use R3
or R1
, depending on the data queried and the content of these reflections.
USE BRANCH main;
SELECT * FROM T1;
USE BRANCH B1;
SELECT * FROM T1;
Example of a Reflection in a Branch Used for Comparing Data
In this scenario, you create a branch so that you can compare data from a point in time with the current data that is in the default branch. Deleting the comparison branch also deletes the reflections that are in that branch.
In this example, table T1
is created in the main
branch, and reflection R1
is created on it. At that point, a new branch is created. Because R1
is not copied into the new branch, reflection R2
is defined on the table. While inserts can continue taking place in the main
branch, the new branch can be used for queries on the isolated data.
USE BRANCH main;
SELECT * FROM T1;
USE BRANCH B1;
SELECT * FROM T1;
Example of a Reflection with a Tag Used for Comparing Data
In this scenario, you tag the commit of a table that occurs at a point in time, so that you can compare the data at this time with data that is in the table at a later time. To accelerate queries on the tagged version of the table, you set the query context to that tag, and then create a reflection on that table. When you query the table, you set the context to the tag by using the AT
clause, like this:
SELECT * FROM <table_name> AT TAG <tag_name>
Even if the tagged reflection is refreshed, the data in it does not change, because the data in the tagged version of the table never changes.
In this example, table T1
is created in the main
branch, and then reflection R1
is created from it. Data is inserted into the table. The commit for the insert is tagged with the tag 2023-02-22
. That tagged version of the table is frozen. Inserts can still be made to the table. However, the tagged version is never changed.
Then, after the SQL context is set to the tag, reflection R2
is defined on T1
within that context. From this point, queries on the table will use R2
only if they are issued within the context of the tag and if they match the reflection.
USE BRANCH main;
SELECT * FROM T1 AT TAG 2023-02-22;
Examples of Joining across Branches
In this scenario, queries join tables or views that are in different branches. The context of a query determines which reflections it uses.
In these examples, table T1
is created in the main
branch, and reflection R1
is defined on it. Then, branch B1
is created. In this branch, table T2
is created, and reflection R2
is defined on it. Reflection R1
is not copied into branch B1
.
Example 1
SELECT issued against the main branch and using reflections R1 and R2USE BRANCH main;
SELECT * FROM T1 JOIN T2 AT BRANCH B1;
This query is run in the context of the main
branch. Reflection R1
satisfies the SELECT
from T1, because both R1
and T1
are in the main
branch. Reflection R2
satisfies the SELECT
from T2
, because the AT
clause sets the context of T2
as branch B1
, the branch in which T2
and R2
are located.
Example 2
SELECT issued against the B1 branch and using the reflections R1 and R2USE BRANCH B1;
SELECT * FROM T1 AT BRANCH main JOIN T2;
This query is run in the context of the branch B1
. Reflection R1
satisfies the SELECT
from T1
, because both R1
and T1
are in the main branch. Reflection R2
satisfies the SELECT
from T2
, because the USE
command sets the context of T2
as branch B1
, the branch in which T2
and R2
are located.
Example 3
SELECT issued against the B1 branch and using the reflection R2USE BRANCH B1;
SELECT * FROM T1 JOIN T2;
This query is run in the context of the branch B1
. The SELECT
on T1
is not satisfied by reflection R1
because that reflection was not copied from the main
branch. However, reflection R2
can satisfy the SELECT
on T2
.