AWS EC2 Deployment

Deployment Architecture

The setup described in this section covers how to deploy Dremio on Amazon EC2.

Ports

The following ports must be open:

Purpose Port From To
UI (HTTPS) 9047 Corporate network (end users) Coordinators
ODBC/JDBC clients (e.g., Tableau, Power BI) 31010 Corporate network (end users) Coordinators
ZooKeeper (internal) 2181 Other Dremio nodes (coordinators and executors) Coordinators
Inter-node communication 45678 Other Dremio nodes Executors
Data source reads Varies All Dremio nodes Data source nodes

Additional Requirements

Please refer to System Requirements for base requirements. The following are additional requirements/instructions for AWS EC2 deployments.

  • An Amazon AWS account
  • A single EC2 instance for each coordinator and/or executor node.
  • Recall that one or more coordinator nodes are required (one coordinator node may also serve as the master node) and one or more executor nodes are required.
  • Master & coordinator node storage is for metadata and logging.
  • Executor node storage is for logging and spilling. Dremio can spill to multiple locally attached disks in a high performance manner.
  • S3 bucket(s) are used for the reflection cache. Please refer to the Amazon S3 section of the Distributed Storage Guide for configuring Amazon S3 as Dremio's distributed storage.

Setting up Amazon EC2 Instances

[info] Prerequisites:
Amazon AWS Account
If you do not already have an account, please create a new one. For the purposes of these instructions, the sign-up process is skipped. Amazon EC2 comes with eligible free-tier instances.

The following instructions for setting up your instances, must be repeated for Dremio master, coordinator and executor machines.

  1. Launch Instance
    Once you have signed up for Amazon account. Login to Amazon Web Services, click on My Account and navigate to Amazon EC2 Console.

  2. Select an AMI
    In this example we’ll pick the Redhat Linux Server 64-bit OS.

  3. Select Instance Type
    Select the ‘m5d.2xlarge’ instance for Coordinators and the ‘m5d.4xlarge’ instance for Executors. Please refer to the System Requirements section for base HW requirements.

    Coordinator instance type: m5d.2xlarge (recommended)

    Executor instance type: m5d.4xlarge (recommended)

  4. Configure Number of Instances
    As previously mentioned an EC2 instance is required for each coordinator and/or executor node in your environment. Recall that one or more coordinator nodes (one coordinator node may also serve as the master node) and one or more executor nodes are required. Depending on whether you are creating the machines for master, coordinator or executor and the number of instances of each you want, set the value of number of instances accordingly.

    [info] Best Practice

    For better network connectivity, select "Placement group" and put your instances in a cluster placement group so they're deployed close to each other.

  5. Add Storage
    For the node that will be the Dremio master, increase the Root disk size to 100GB.

    [info] NOTE

    Master & Coordinator node storage is for metadata and logging. Executor node storage is for logging and AGG/JOIN spilling. Dremio can spill to multiple locally attached disks in a high performance manner.

  6. Instance Description Give instances names to identify their types - dremio master, coordinator, executor.

  7. Define a Security Group
    Create a new (or modify existing) security group with security rules allowing access to ports 22, 9047 and 31010. As highlighted by the warning, you should limit the source to your corporate IP ranges.

  8. Launch Instance and Create Security Pair
    Amazon EC2 uses public–key cryptography to encrypt and decrypt login information. Public–key cryptography uses a public key to encrypt a piece of data, such as a password, then the recipient uses the private key to decrypt the data. The public and private keys are known as a key pair.

    You will be asked to choose a pem-key which will be used to login to these instances. If this is your first time, you can generate a new pem-key and download it to your computer.

    [warning] WARNING

    If you lose your pem-key there is no way to recover it and thus lose access to any instances that are associated with this pem-key.

    Create a new keypair and give it a name “Dremio_Cluster” and download the keypair (.pem) file to your local machine. Click Launch Instance.

  9. Repeat these steps for creating the instances for coordinators and executors.

Installing and Configuring Dremio

At this point, if you have already installed and configured Dremio, you should be able to SSH into the various Coordinator and Executor instances and each node should be able to contact each other within the cluster.

If you need to install and configure Dremio, See RPM and Tarball Installtion for installation instructions and Dremio Configuration for configuration.


results matching ""

    No results matching ""