Deploy Dremio
This topic describes the deployment models. Dremio is a distributed system that can be deployed in a public cloud or on-premises. A Dremio cluster can be co-located with one of the data sources (Hadoop or NoSQL database) or deployed separately.
Deploy to Kubernetes
This is the recommended deployment option for Dremio. For that, check the following topics in this section:
-
Deploying to Kubernetes - The procedure to deploy Dremio to your Kubernetes environment.
-
Configuring Your Values - Understand the configuration of your deployments in more detail.
-
Managing Engines - Once deployed, manage Dremio engines to optimize the execution of your queries.
Other Deployment Options
Besides Kubernetes, Dremio provides other options for deployment described in this section.
Shared Multi-Tenant Environment
If you plan on using a shared multi-tenant environment, Dremio provides a model that uses YARN for deployment:
- Hadoop using YARN - Dremio on Hadoop in YARN deployment. Dremio integrates with YARN ResourceManager to secure compute resources in a shared multi-tenant environment.
Co-locating Dremio with Hadoop/NoSQL: When Dremio is co-located with a Hadoop cluster (such as HDFS) or distributed NoSQL database (such as Elasticsearch or MongoDB), it is important to utilize containers (cgroups, Docker, and YARN containers) to ensure adequate resources for each process.
Dremio features a high-performance asynchronous engine that minimizes the number of threads and context switches under heavy load. So, unless containers are utilized, the operating system may over-allocate resources to other thread-hungry processes on the nodes.
Standalone Cluster
If you plan on creating a standalone cluster, Dremio provides the flexibility to deploy Dremio as a standalone on-premise cluster:
- Standalone Cluster - Dremio on a standalone on-premise cluster. In this scenario, a Hadoop cluster is not available and the data is not in a single distributed NoSQL database.