This topic describes Dremio deployment models. Dremio is a distributed system that can be deployed in a public cloud or on premises. A Dremio cluster can be co-located with one of the data sources (Hadoop or NoSQL database) or deployed separately.
The following models and associated environments are provided:
|AWS Edition||Cloud Service Provider Environment|
|Azure ARM||Cloud Service Provider Environment|
|AWS EKS||Hosted Kubernetes Environment|
|Azure AKS||Hosted Kubernetes Environment|
|Hadoop using YARN||Shared Multi-Tenant Environment|
|MapR using YARN||Shared Multi-Tenant Environment|
|Standalone Cluster||Standalone - Deployed separately|
If you plan on using a cloud service provider’s environment, Dremio provides the following that are streamlined for each of the cloud providers unique deployment and management processes.
|AWS Edition||Azure ARM|
|Dremio on Amazon AWS that hosts S3 and other databases.||Dremio on Azure that hosts ADLS and other databases.|
If you plan on using a hosted Kubernetes environment, Dremio provides the following models that are a quick and easy method for deploying and managing containerized applications.
|Azure AKS||Amazon EKS|
|Dremio on Azure Kubernetes Service (AKS) to manage a hosted Kubernetes environment. Provides a quick and easy method for deploying and managing containerized applications.||Dremio on Amazon Elastic Container Service for Kubernetes (Amazon EKS) to deploy, manage, and scale containerized applications using Kubernetes on AWS.|
If you plan on using a shared multi-tenant environment, Dremio provides the following models that use YARN for deployment.
|Hadoop using YARN||MapR using YARN|
|Dremio on Hadoop in YARN deployment mode; Dremio integrates with YARN ResourceManager to secure compute resources in a shared multi-tenant environment.||Dremio on MapR in YARN deployment mode; Dremio integrates with YARN ResourceManager to secure compute resources in a shared multi-tenant environment.|
Co-locating Dremio with Hadoop/NoSQL
When Dremio is co-located with a Hadoop cluster (such as HDFS or MapR-FS) or distributed NoSQL database (such as Elasticsearch or MongoDB), it is important to utilize containers (cgroups, Docker, and YARN containers) to ensure adequate resources for each process.
Dremio features a high-performance asynchronous engine that minimizes the number of threads and context switches under heavy load, so unless containers are utilized, the operating system may over-allocate resources to other thread-hungry processes on the nodes.
If you plan on creating a standalone cluster, Dremio provides the flexibility to deploy Dremio as a standalone on-premise cluster.
|Dremio on a standalone on-premise cluster; In this deployment scenario, a Hadoop cluster is not available and the data is not in a single distributed NoSQL database.|