Skip to main content
Version: current [25.x]

Using Reflections in Nessie Source Branches

You can use the branching capabilities in a Nessie source as part of your reflection management workflow. For example, you can create branches to experiment with different reflection types and patterns, and see how each reflection affects your workload. Once you’re happy, you can roll out these changes in your production branch to help end users accelerate their workloads.

note
  • When you create a branch, reflections that are in the base branch are not copied into the new branch. In the new branch, you must create any reflections that you want to work with, as well as set their refresh and expiration policies.

  • When you merge a branch, reflections that are in that branch are not copied into the base branch. If you want to use the reflections that were in your branch, you must recreate them in the base branch.

  • Reflections that are in one branch cannot be used to accelerate queries run against another branch. For example, suppose table T1 exists in the main branch and we create branch test. A reflection is defined on the table T1 in test. However, no reflection is defined on the table T1 in the main branch. A query runs against the table T1 in the main branch. Even if the reflection in branch test could match and accelerate the query, the reflection will not accelerate a query that is run in the context of branch main. A reflection in a particular branch is used only to accelerate queries in that branch.

Example of a Reflection in a Main Branch

You can create reflections in the default branch of a Nessie source to accelerate queries on tables and views in that branch.

In this example, table T1 is created in the main branch of a catalog, and reflection R1 is created on that table. Data is inserted into the table, and the reflection is subsequently refreshed (either according to a schedule or manually), so that it includes the inserted data.

SELECT on table T1 in branch main using reflection R1, if reflection is a good match
USE BRANCH main;
SELECT * FROM T1;

Example of a Reflection in a Branch Used for Development

In this scenario, you create a branch so that you can manipulate tables outside of the default branch. No reflections that are in the default branch are copied into the new branch. You must create any reflections that you want to use in the new branch. If you merge your branch back into the default branch or into a different branch, none of the reflections in it are copied into the branch merged into. Deleting your branch also deletes the reflections that are in that branch.

In this example, table T1 is created in the main branch, and reflection R1 is created on that table. Then, branch B1 is created for testing changes to the structure of the table. When B1 is created, table T1 is copied over from the main branch, but R1 is not.

In branch B1, table T1 is altered by the addition of a column. Data is then inserted into the table. Reflection R2 is defined on the table to accelerate subsequent queries on it.

In main branch in the meantime, data is inserted into the table and the reflection is refreshed, so that it includes the new data. Queries on the table in the main branch continue to use the reflection defined on it earlier, if the reflection is a good match.

Eventually, the changes to the table T1 in the branch B1 are merged into the main branch. The reflection R2 in the branch B1 is not merged into the main branch, too. However, in the main branch, a reflection that is identical to R2 is created. This new reflection is R3. Queries on T1 in the main branch can use R3 or R1, depending on the data queried and the content of these reflections.

SELECT on table T1 in branch main using reflection R1 or R3, depending on which reflection matches best
USE BRANCH main;
SELECT * FROM T1;
SELECT on table T1 in branch B1 using reflection R2, if the reflection is a good match
USE BRANCH B1;
SELECT * FROM T1;

Example of a Reflection in a Branch Used for Comparing Data

In this scenario, you create a branch so that you can compare data from a point in time with the current data that is in the default branch. Deleting the comparison branch also deletes the reflections that are in that branch.

In this example, table T1 is created in the main branch, and reflection R1 is created on it. At that point, a new branch is created. Because R1 is not copied into the new branch, reflection R2 is defined on the table. While inserts can continue taking place in the main branch, the new branch can be used for queries on the isolated data.

SELECT on table T1 in branch main using reflection R1, if the reflection is a good match
USE BRANCH main;
SELECT * FROM T1;
SELECT on table T1 in branch B1 using reflection R2, if the reflection is a good match
USE BRANCH B1;
SELECT * FROM T1;

Example of a Reflection with a Tag Used for Comparing Data

In this scenario, you tag the commit of a table that occurs at a point in time, so that you can compare the data at this time with data that is in the table at a later time. To accelerate queries on the tagged version of the table, you set the query context to that tag, and then create a reflection on that table. When you query the table, you set the context to the tag by using the AT clause, like this:

A SELECT statement setting its context
SELECT * FROM <table_name> AT TAG <tag_name>

Even if the tagged reflection is refreshed, the data in it does not change, because the data in the tagged version of the table never changes.

In this example, table T1 is created in the main branch, and then reflection R1 is created from it. Data is inserted into the table. The commit for the insert is tagged with the tag 2023-02-22. That tagged version of the table is frozen. Inserts can still be made to the table. However, the tagged version is never changed.

Then, after the SQL context is set to the tag, reflection R2 is defined on T1 within that context. From this point, queries on the table will use R2 only if they are issued within the context of the tag and if they match the reflection.

SELECT against tagged version of table T1 in branch main using reflection R2, if the reflection is a good match
USE BRANCH main;
SELECT * FROM T1 AT TAG 2023-02-22;

Examples of Joining across Branches

In this scenario, queries join tables or views that are in different branches. The context of a query determines which reflections it uses.

In these examples, table T1 is created in the main branch, and reflection R1 is defined on it. Then, branch B1 is created. In this branch, table T2 is created, and reflection R2 is defined on it. Reflection R1 is not copied into branch B1.

Example 1

SELECT issued against the main branch and using reflections R1 and R2
USE BRANCH main;
SELECT * FROM T1 JOIN T2 AT BRANCH B1;

This query is run in the context of the main branch. Reflection R1 satisfies the SELECT from T1, because both R1 and T1 are in the main branch. Reflection R2 satisfies the SELECT from T2, because the AT clause sets the context of T2 as branch B1, the branch in which T2 and R2 are located.

Example 2

SELECT issued against the B1 branch and using the reflections R1 and R2
USE BRANCH B1;
SELECT * FROM T1 AT BRANCH main JOIN T2;

This query is run in the context of the branch B1. Reflection R1 satisfies the SELECT from T1, because both R1 and T1 are in the main branch. Reflection R2 satisfies the SELECT from T2, because the USE command sets the context of T2 as branch B1, the branch in which T2 and R2 are located.

Example 3

SELECT issued against the B1 branch and using the reflection R2
USE BRANCH B1;
SELECT * FROM T1 JOIN T2;

This query is run in the context of the branch B1. The SELECT on T1 is not satisfied by reflection R1 because that reflection was not copied from the main branch. However, reflection R2 can satisfy the SELECT on T2.