This topic describes Dremio deployment models. Dremio is a distributed system that can be deployed in a public cloud or on premises. A Dremio cluster can be co-located with one of the data sources (Hadoop or NoSQL database) or deployed separately.
The following models and associated environments are provided:
Cloud Service Provider Environment
If you plan on using a cloud service provider’s environment, Dremio provides the following that are streamlined for each of the cloud providers unique deployment and management processes.
|AWS Edition||Azure ARM|
|Dremio on Amazon AWS that hosts S3 and other databases.||Dremio on Azure that hosts ADLS and other databases.|
Hosted Kubernetes Environment
If you plan on using a hosted Kubernetes environment, Dremio provides the following models that are a quick and easy method for deploying and managing containerized applications.
|Azure AKS||Amazon EKS||Google Cloud GKE|
|Dremio on Azure Kubernetes Service (AKS) to manage a hosted Kubernetes environment. Provides a quick and easy method for deploying and managing containerized applications.||Dremio on Amazon Elastic Container Service for Kubernetes (Amazon EKS) to deploy, manage, and scale containerized applications using Kubernetes on AWS.||Dremio on Google Kubernetes Engine (GKE) to deploy, manage, and scale containerized applications using Kubernetes on Google Cloud.|
Shared Multi-Tenant Environment
If you plan on using a shared multi-tenant environment, Dremio provides the following models that use YARN for deployment.
|Hadoop using YARN||MapR using YARN|
|Dremio on Hadoop in YARN deployment mode; Dremio integrates with YARN ResourceManager to secure compute resources in a shared multi-tenant environment.||Dremio on MapR in YARN deployment mode; Dremio integrates with YARN ResourceManager to secure compute resources in a shared multi-tenant environment.|
Co-locating Dremio with Hadoop/NoSQL: When Dremio is co-located with a Hadoop cluster (such as HDFS or MapR-FS) or distributed NoSQL database (such as Elasticsearch or MongoDB), it is important to utilize containers (cgroups, Docker, and YARN containers) to ensure adequate resources for each process.
Dremio features a high-performance asynchronous engine that minimizes the number of threads and context switches under heavy load, so unless containers are utilized, the operating system may over-allocate resources to other thread-hungry processes on the nodes.
If you plan on creating a standalone cluster, Dremio provides the flexibility to deploy Dremio as a standalone on-premise cluster.
|Dremio on a standalone on-premise cluster; In this deployment scenario, a Hadoop cluster is not available and the data is not in a single distributed NoSQL database.|