On this page

    Overview of Dremio Cloud

    Architecture

    Dremio Cloud’s functions are divided between virtual private clouds (VPCs): Dremio’s and yours. Dremio’s VPC acts as the control plane. Your VPC acts as an execution plane. If you use multiple cloud accounts with Dremio Cloud, each VPC acts as an execution plane.

    Dremio’s Control Plane

    There are two Dremio control planes, one hosted in the United States and the other hosted in Europe, but their functions are identical. Each of the two control planes hosts Dremio Cloud’s web interface, handles query requests, hosts REST API endpoints, and manages the engines for all of the customers that are using that plane, keeping the experiences for each customer separate. The control plane also stores data about the jobs that run your organization’s queries, statistics about your organization’s use of Dremio Cloud, and other metadata.

    The Execution Plane

    In your VPC resides the execution plane, which consists of one or more compute engines per subnet. Dremio Cloud provisions engines automatically as needed for the execution of queries. For example, if the VPC for your organization is running in AWS, Dremio Cloud’s control plane deploys compute engines as AWS EC2 instances within your VPC. The execution plane is also where your data is stored, and where the metadata for your Dremio Cloud projects is stored.

    Overview of How Queries Are Run Across the Two Types of Planes

    This diagram gives a simplified account of how the two planes interact when a user logs into Dremio Cloud and runs a query:

    1. Someone authenticates to Dremio Cloud through a BI client application.
    2. The SQL proxy passes the credentials to the authentication manager, which validates the credentials and approves the authentication request.
    3. The person who authenticated issues a query to Dremio.
    4. The SQL proxy forwards the query to the query planner.
    5. The query planner notifies the engine manager of the request.
    6. The engine manager finds or starts up a compute engine that has the resources to run the query. The compute engine runs within a subnet of your VPC, which might have multiple subnets available, each with resources for additional compute engines. Within one of the subnets runs the preview engine, the compute engine that Dremio Cloud uses to return previews (or subsets) of data to its SQL runner when a user runs a query there.
    7. The query planner passes the plan for the query to the compute engine that the engine manager has designated.
    8. The compute engine passes the results of the query back to the query planner, which passes them to the SQL proxy, which then passes them to the BI client application.

    Objects in Dremio Cloud

    When you work in the Dremio Cloud user interface, you work in or with the objects that are depicted in this diagram:

    Organization

    An organization is the virtual structure within Dremio Cloud in which a single, real-world organization or person (depending on who signs up for a Dremio Cloud account) creates and manages data-analysis projects. An organization is created during the sign-up process.

    Clouds

    A cloud represents a compute environment (AWS) in which Dremio Cloud engines run. Each object holds the credentials (access key/cross-account role) for each cloud that is configured and is associated with a particular Amazon Virtual Private Cloud (Amazon VPC), where compute resources are launched. A single cloud can be associated with more than one project. For more information, see Managing Clouds.

    Projects

    A project isolates compute, data and other resources needed by a team for data analysis. An organization may contain multiple projects. A project must be linked to a single cloud account. Your first project was created and linked to your cloud account as part of the sign-up process. For more information, see Managing Projects.

    This diagram shows the clouds and projects in an organization, and how they are related. A cloud can be associated with more than project, but a project can be associated with only one cloud:

    Users

    Users can participate in more than one project. When users are added to an organization by an organization administrator, the administrator assigns them roles that determine what they are allowed to do.

    Roles

    Each user is assigned the Admin role, the Public role, or both when added to an organization. These roles are currently pre-configured in Dremio Cloud. You can add additional roles to suit the needs of your organization.

    Engines

    An engine process jobs that run queries issued by users (either through a client application or through the user interface) or by Dremio Cloud (as, for example, when Dremio Cloud creates a reflection that a user has defined). Compute resources for an engine are allocated in the cloud associated with the project. All engines in a project are associated with the same cloud.

    Engines are made up of one or more EC2 instances and are automatically started and stopped by the Dremio Cloud control plane. Engines can be configured to have multiple replicas, which allow for scaling up. For more information, see Managing Engines.

    Data Sources

    A data source can be a data lake, such as Amazon S3 and AWS Glue Catalog, or a relational database (referred to as an external source). For more information, see Connecting to Your Data.

    Spaces

    A space is a directory in which virtual datasets are saved. Spaces allow people in your organization to group datasets by common themes, such as purposes, departments, or geographic regions. You can also create folders within spaces to organize your datasets further. When you join a project in Dremio Cloud, your user ID is given its own home space by default. For more information, see Spaces.

    Physical Datasets and Virtual Datasets

    A physical dataset is a table representation of the data in your source. A physical dataset cannot be modified by Dremio Cloud.

    A virtual dataset is a view representation that results from filters, joins, conversions, and other transformations on physical datasets, other virtual datasets, or both.

    To learn more, see Datasets.

    Reflections

    A reflection is an optimized materialization of source data or a query, similar to a materialized view, that is derived from an existing virtual or physical dataset. To learn more, see Accerating Queries with Reflections.