AWS EC2 Deployment
The setup described in this section covers how to deploy Dremio on Amazon EC2.
The following ports must be open:
|UI (HTTPS)||9047||Corporate network (end users)||Coordinators|
|ODBC/JDBC clients (e.g., Tableau, Power BI)||31010||Corporate network (end users)||Coordinators|
|ZooKeeper (internal)||2181||Other Dremio nodes (coordinators and executors)||Coordinators|
|Inter-node communication||45678||Other Dremio nodes||Executors|
|Data source reads||Varies||All Dremio nodes||Data source nodes|
Please refer to System Requirements for base requirements. The following are additional requirements/instructions for AWS EC2 deployments.
- An Amazon AWS account
- A single EC2 instance for each coordinator and/or executor node.
- Recall that one or more coordinator nodes are required (one coordinator node may also serve as the master node) and one or more executor nodes are required.
- Master & coordinator node storage is for metadata and logging.
- Executor node storage is for logging and spilling. Dremio can spill to multiple locally attached disks in a high performance manner.
- S3 bucket(s) are used for the reflection cache. Please refer to the Amazon S3 section of the Distributed Storage Guide for configuring Amazon S3 as Dremio's distributed storage.
Setting up Amazon EC2 Instances
Amazon AWS Account
If you do not already have an account, please create a new one. For the purposes of these instructions, the sign-up process is skipped. Amazon EC2 comes with eligible free-tier instances.
The following instructions for setting up your instances, must be repeated for Dremio master, coordinator and executor machines.
Once you have signed up for Amazon account. Login to Amazon Web Services, click on My Account and navigate to Amazon EC2 Console.
Select an AMI
In this example we’ll pick the Redhat Linux Server 64-bit OS.
Select Instance Type
Select the ‘m5d.2xlarge’ instance for Coordinators and the ‘m5d.4xlarge’ instance for Executors. Please refer to the System Requirements section for base HW requirements.
Coordinator instance type: m5d.2xlarge (recommended)
Executor instance type: m5d.4xlarge (recommended)
Configure Number of Instances
As previously mentioned an EC2 instance is required for each coordinator and/or executor node in your environment. Recall that one or more coordinator nodes (one coordinator node may also serve as the master node) and one or more executor nodes are required. Depending on whether you are creating the machines for master, coordinator or executor and the number of instances of each you want, set the value of number of instances accordingly.
[info] Best Practice
For better network connectivity, select "Placement group" and put your instances in a cluster placement group so they're deployed close to each other.
For the node that will be the Dremio master, increase the Root disk size to 100GB.
Master & Coordinator node storage is for metadata and logging. Executor node storage is for logging and AGG/JOIN spilling. Dremio can spill to multiple locally attached disks in a high performance manner.
Instance Description Give instances names to identify their types - dremio master, coordinator, executor.
Define a Security Group
Create a new (or modify existing) security group with security rules allowing access to ports 22, 9047 and 31010. As highlighted by the warning, you should limit the source to your corporate IP ranges.
Launch Instance and Create Security Pair
Amazon EC2 uses public–key cryptography to encrypt and decrypt login information. Public–key cryptography uses a public key to encrypt a piece of data, such as a password, then the recipient uses the private key to decrypt the data. The public and private keys are known as a key pair.
You will be asked to choose a pem-key which will be used to login to these instances. If this is your first time, you can generate a new pem-key and download it to your computer.
If you lose your pem-key there is no way to recover it and thus lose access to any instances that are associated with this pem-key.
Create a new keypair and give it a name “Dremio_Cluster” and download the keypair (.pem) file to your local machine. Click Launch Instance.
Repeat these steps for creating the instances for coordinators and executors.
Installing and Configuring Dremio
At this point, if you have already installed and configured Dremio, you should be able to SSH into the various Coordinator and Executor instances and each node should be able to contact each other within the cluster.