Configuring Cloud Resources with Templates
This tutorial introduces key Dremio concepts and tasks including:
- Creating a Sonar project
- Creating a table and populating it with data
- Accelerating queries to sub-second using data reflections
This tutorial can be completed using your own data sources and datasets; for simplicity, we recommend using the datasets that we’ve provided.
For Azure accounts, you can use an ARM template for only your first project. You must use the manual procedure to add subsequent projects.
Prerequisites
Before you begin, sign up for Dremio Cloud and verify that you meet the prerequisites. After you’ve signed up, you’ll be logged into your organization and navigated to your organization homepage.
You can now get started with adding a Sonar project.
Step 1. Add a Sonar Project
This tutorial uses the CloudFormation template (CFT) for connecting an AWS account or the Azure Resource Manager (ARM) template for connecting an Azure account. Templates are the recommended method for configuring project resources.
If you are using the CFT for AWS, see the annotated CloudFormation Template for more information about the resources created and permissions granted to Dremio.
To manually create cloud and project resources instead of using templates, see Configuring Your Cloud Resources Manually.
After addressing the prerequisites, follow these steps to create your first Sonar project:
-
On your organization homepage, click Add Sonar Project on the Sonar card.
-
In the Add Sonar Project dialog, for Project name, specify a name to identify your project. You can change the name of the project after creation.
-
For Arctic catalog name, specify a name to identify the primary catalog in your project. The name cannot be changed once the catalog has been created. The name cannot include the following special characters:
/
,:
,[
, or]
. -
Select a cloud provider.
-
Select the region in which compute resources and the project store will be created. For a list of supported regions, see Supported Regions.
-
(Optional) Add one or more tags to help you identify compute resources for this project in your cloud provider account.
-
Click Next.
To continue the configuration, follow the steps for your cloud provider.
- AWS
- Azure
- Click Launch CloudFormation Template, which will open the AWS console in a new browser tab.
- On the “Quick create stack” page in the AWS Console, for Stack name, enter a name to identify the stack. This name must be unique to your AWS account and cannot include underscores.
- For VPC, select the VPC in which compute resources will be created.
- For Subnets, select one or more subnets where compute resources will be created.
- For Project Store, a unique name has been generated for the S3 bucket that will be created in your AWS account as the metadata store for your project. You can specify a different name here but it must be unique across AWS.
- For Encryption Type, SSE-S3 is the default encryption mechanism for the project store S3 bucket. You can select SSE-KMS (AWS Managed Key) or SSE-KMS (Customer Managed Key) for additional encryption. If you select SSE-KMS (Customer Managed Key), specify the KMS Key ARN.
- Check I acknowledge that AWS CloudFormation might create IAM resources.
- Click Create stack.
It will take about 5 minutes to create the required storage and compute resources for the project.
When stack creation has completed in the AWS Console, return to the Dremio Console to start creating your first table.
- For Object ID, paste the ID of the service principal object associated with the registered application.
- For Tenant ID, paste the ID of the Azure tenant.
- For Application (client) ID, paste the application ID of the registered application.
- For Client Secret, paste the value of the client secret that you created in the registered application.
- Click Launch ARM. A new Azure Portal > Custom deployment window opens in the browser. If you are not logged in to Azure, you will be prompted to do so in the Azure Portal window.
- Complete the following fields in the Azure Portal > Custom deployment window. Do not edit the ARM template directly.
- For Subscription, enter the Azure subscription to use.
- For Resource group, select the resource group where the Dremio resources will be created. You must use a new resource group. If you did not create a new resource group when completing the prerequisites, use the Create new link under the dropdown-menu to do so now. You must have permission to create resources in this resource group.
- For Region, Azure automatically specifies the resource group's region.
- For Resource Deployment Location, enter the region where the resources in the ARM template will be created.
- For Control Plane Private Endpoint, select true if a private link endpoint should be created to connect engines to Dremio’s control plane. Select false to connect engines to Dremio’s control plane using the public internet.
- For Resource Tags, update the default JSON key/value pairs to apply as tags for the Azure resources.
- At the bottom of the Azure Portal > Custom deployment window, click Review + create.
- Review the configuration in the Azure Portal > Custom deployment window and click Create.
- Close the Azure Portal window.
Azure executes the ARM template and starts the deployment, which may take several minutes to complete. Azure notifies Dremio when the deployment is complete, and Dremio creates the cloud, project, and catalog.
IMPORTANT: If Azure fails to create the deployment script provisioning, for example due to a timeout when provisioning the container that runs the script, Azure does not notify Dremio and any Azure resources that were created in the resource group are not deleted. You must manually delete any Azure resources that were created in the resource group before retrying the deployment.
After Dremio finishes creating the cloud, project, and catalog, the Dremio console displays the Welcome to Dremio Sonar modal window and the Datasets page for the project.
If you encounter errors during resource creation for AWS, see Dremio Cloud Troubleshooting for AWS.
In case of any errors during ARM template deployment on Azure, refer to the Azure documentation to troubleshoot common Azure deployment errors.
Step 2. Create a Table
For this step, you will be working with the NYC-taxi-trips.csv
file (330+ million rows) in an Amazon S3 bucket.
You will be creating a folder and a table in your catalog. In each of the steps, replace catalog_name
with the name of the catalog you created in the previous step.
Create a Folder
To create a folder where your table will reside:
-
Click in the side navigation bar.
-
Copy and paste the SQL below and replace
Create folder in catalogcatalog_name
with the name of your catalog. Click Run.CREATE FOLDER "catalog_name"."my_folder";
Create a Table
To create a table in your catalog:
-
Click in the side navigation bar.
-
Copy and paste the SQL below and click Run.
Create table in folderCREATE TABLE "catalog_name"."my_folder"."nyc_trips" (
pickup_datetime TIMESTAMP,
passenger_count INT,
trip_distance_mi FLOAT,
fare_amount FLOAT,
tip_amount FLOAT,
total_amount FLOAT
); -
To populate the
Populate table with datanyc_trips
table with sample data, run the SQL below.COPY INTO "catalog_name"."my_folder"."nyc_trips"
FROM '@Samples/samples.dremio.com/' FILES('NYC-taxi-trips.csv'); -
To query the populated data, run the SQL below.
Query tableSELECT *
FROM "catalog_name"."my_folder"."nyc_trips";
Now that you've created your first table in the catalog, you can accelerate your queries on this table using a reflection.
Step 3. Accelerate Queries with a Reflection
Queries on the nyc_trips
table are fast but queries on this table or views built from this table triggered from dashboards need to be sub-second. To optimize queries and get sub-second response times on 330+ million rows, you will create a data reflection. Data reflections organize and optimize the data close to Dremio Sonar’s query engine, using techniques such as pre-computation, columnarization, compression, sorting, partitioning, and data aggregation.
To create an aggregation reflection:
-
Run the following SQL in the SQL Runner:
Create reflection on the tableALTER TABLE "catalog_name"."my_folder"."nyc_trips"
CREATE AGGREGATE REFLECTION "taxi_reflection"
USING DIMENSIONS ("pickup_datetime")
MEASURES ("passenger_count",
"trip_distance_mi",
"fare_amount",
"tip_amount",
"total_amount");It takes only a few seconds to create this reflection. Creating a reflection on the table allows views created on top of that data to take advantage of the acceleration as well.
You can also create reflections using the Dremio console. See Creating Reflections to try it out.
- After you have created the reflection, run the query below, which takes advantage of the aggregation reflection.
The reflection was created on the nyc_trips
table but will also accelerate queries on views built on this table. When a query like this is triggered from the SQL Runner or a dashboard, data from all 330+ million rows is used to return the result.
SELECT DATE_PART('DAY', "catalog_name"."my_folder"."nyc_trips"."pickup_datetime") AS "pickup_day",
SUM("catalog_name"."my_folder"."nyc_trips"."passenger_count") AS "passenger_count"
FROM "catalog_name"."my_folder"."nyc_trips"
GROUP BY pickup_day
ORDER BY pickup_day;
Notice the query runs in less than a second.
Step 4. Add Dataset Info to Enhance Discoverability
For users who will be querying and building on the table that you created, add a markdown description and label to help them understand the data.
-
Click in the side navigation bar and click on
my_folder
in the upper left corner. -
In
my_folder
, hover over thenyc_trips
table that you created and click on the right. Dremio takes you to the Datasets page fornyc_trips
, which is indicated by the name and path in the top left. -
In the Details tab, on the Dataset Overview panel on the right, click to add a label that will help users identify the dataset, type
public-data
, and press Enter. -
In the Details tab, in the Wiki section, click Edit wiki to add markdown to describe the table and provide helpful content for the user to get started with the data (i.e., examples, usage notes, point of contact for questions or issues). Copy and paste the example below or write your own and click Save.
Example wikiThis table contains taxi trips data from NYC. It can be used to determine the taxi trends i.e. frequency of trips, distance traveled, and average fares or tips.
Questions? Contact gnarly@dremio.com
### Columns
`pickup_datetime`: Time at which the passenger was picked up.
`passenger_count`: The number of passengers that were picked up for the taxi ride.
`trip_distance_mi`: The total distance of the trip from pickup to drop off locations in miles.
* Max: 351.0
`fare_amount`: The dollar amount that the ride cost.
* Max: 158995.81
`tip_amount`: The dollar amount that the customer tipped in addition to the fare_amount.
* Max: 888.19
`total_amount`: The total dollar amount that the customer paid (fare + tip).
### Examples
`SELECT DATE_PART('DAY', "catalog_name"."my_folder"."nyc_trips"."pickup_datetime") AS "pickup_day",
SUM("catalog_name"."my_folder"."nyc_trips"."passenger_count") AS "passenger_count"
FROM "catalog_name"."my_folder"."nyc_trips"
GROUP BY pickup_day
ORDER BY pickup_day;`
Now your users can easily understand and query the my_folder.nyc_trips
table you created.
Wrap-up and Next Steps
Within minutes after creating your first project, you were able to create and populate a table, accelerate queries using reflections, and make it easy to understand and query using the wiki and label capabilities.
Key Takeaways
- You can create a table in Dremio and populate it with data quickly, which enables you to run DML operations on your data out of the box as you would in a traditional data warehouse.
- You can accelerate your queries (triggered from the SQL Runner, dashboards, ODBC/JDBC) using data reflections.
- You can enable your users to quickly understand, query, and work with data using the wiki and labels.
Clean Up (Optional)
Run the following SQL commands below from the SQL Runner to drop the objects that were created in this tutorial.
Drop objectsDROP TABLE "catalog_name"."my_folder"."nyc_trips";
Next Steps
Continue learning about and familiarizing yourself with the key Dremio concepts and features covered in this tutorial by using the following resources:
- How Dremio Uses Apache Iceberg
- Accelerating Queries with Reflections and Best Practices
- Objects in Dremio Cloud
To apply the same concepts with your own data, see Connecting to Your Data.
Have Questions?
If you have questions about the concepts in this tutorial or other Dremio features, reach out to us using the chat feature in the bottom-right corner of the Dremio application.