Deploy Dremio
This topic describes the deployment models. Dremio is a distributed system that can be deployed in a public cloud or on-premises. A Dremio cluster can be co-located with one of the data sources (Hadoop or NoSQL database) or deployed separately.
Deploy on Kubernetes
Kubernetes is the recommended deployment option for Dremio. For more information, see the following topics in this section:
-
Kubernetes Environments – Learn about the Kubernetes environments used to deploy Dremio.
-
Deploying on Kubernetes – Deploy Dremio on your Kubernetes environment.
-
Configuring Your Values – Understand the configuration of your deployments in more detail.
-
Managing Engines – Manage Dremio engines to optimize query execution.
Other Deployment Options
Besides Kubernetes, Dremio provides other options for deployment described in this section.
Shared Multi-Tenant Environment
If you plan on using a shared multi-tenant environment, Dremio provides a model that uses YARN for deployment:
- Hadoop using YARN - Dremio on Hadoop in YARN deployment. Dremio integrates with YARN ResourceManager to secure compute resources in a shared multi-tenant environment.
Co-locating Dremio with Hadoop/NoSQL: When Dremio is co-located with a Hadoop cluster (such as HDFS) or distributed NoSQL database (such as Elasticsearch or MongoDB), it is important to utilize containers (cgroups, Docker, and YARN containers) to ensure adequate resources for each process.
Dremio features a high-performance asynchronous engine that minimizes the number of threads and context switches under heavy load. So, unless containers are utilized, the operating system may over-allocate resources to other thread-hungry processes on the nodes.
Standalone Cluster
If you plan on creating a standalone cluster, Dremio provides the flexibility to deploy Dremio as a standalone on-premise cluster:
- Standalone Cluster - Dremio on a standalone on-premise cluster. In this scenario, a Hadoop cluster is not available and the data is not in a single distributed NoSQL database.