MapR Deployment (YARN)

Deployment Architecture

In YARN Deployment mode, Dremio integrates with YARN ResourceManager to secure compute resources in a shared multi-tenant environment. The integration enables enterprises to more easily deploy Dremio on a Hadoop cluster, including the ability to elastically expand and shrink the execution resources. The followig diagram illustrates the high-level deployment architecture of Dremio on a MapR cluster.

Key components of the overall architecture:

  • Dremio Coordinator should be deployed on the edge node.
  • Dremio Coordinator is subsequently configured, via the Dremio UI, to launch Dremio Executors in YARN containers. The number of Executors and the resources allocated to them can be managed through the Dremio UI. See system requirements for resource needs of each node type.
  • It is recommended that a dedicated YARN queue be set up for the Dremio Executors in order to avoid resource conflicts.
  • Dremio Coordinators and Executors are configured to use MapR-FS volumes for the cache and spill directories.

Additional Requirements

Please refer to System Requirements for base requirements. The following are additional requirements for YARN (MapR) deployments.

MapR

  • Installing Dremio requires MapR administrative privileges. Dremio services running on MapR clusters should be run as the mapr user or a user account using a service with an impersonation ticket and have read privileges for the MapR-FS directories/files that will either be queried directly or that map to the Hive Metastore.

  • Create a dedicated MapR volume and directory for the Dremio's distributed cache. Dremio user should have read and write permissions.

  • Optionally, create a dedicated YARN queue for Dremio executor nodes with job submission privileges for the Dremio user. Here is a sample fair-scheduler.xml entry:

    <allocations>
     <queue name="dremio">
       <minResources>320000 mb,160 vcores,0 disks</minResources>
       <maxResources>640000 mb,320 vcores,0 disks</maxResources>
       <aclSubmitApps>mapr</aclSubmitApps>
     </queue>
    </allocations>
    

    Run the following for queue configuration changes to take affect:

    sudo -u mapr yarn rmadmin -refreshQueues
    
  • In order for CPU configuration specified in Dremio to be used and enforced on the YARN side:

    • CPU scheduling needs to be enabled in YARN.
    • Linux CGroup enforcement has to be enabled in YARN.

Network

Purpose Port From To
ZooKeeper (External MapR) 5181 Dremio nodes ZK
CLDB (MapR) 7222 Coordinators CLDB
DataNodes (MapR) 5660 Dremio nodes MapR data nodes
YARN ResourceManager (MapR) 8032 Coordinators YARN RM

Install Dremio Coordinator

You can follow the instructions for RPM/Tarball Installation.

Installation should be done as the mapr user and not as the dremio user.

Configure Dremio Coordinators

In dremio.conf:

  • Set master node and specify coordinator role:

    services: {
    coordinator.enabled: true,
    coordinator.master.enabled: true,
    executor.enabled: false
    }
    
  • Set local metadata location. This should be a directory that only exists on coordinator nodes:

    paths: {
    local: "/var/lib/dremio" 
    ...
    }
    
  • Use the dedicated MapR volume created above as distributed cache location for all nodes:

    paths: {
    ...
    dist: "maprfs:///<MOUNT_PATH>/<CACHE_DIRECTORY>"
    }
    
  • Use MapR ZooKeeper for coordination:

    zookeeper: "<ZOOKEEPER_HOST_1>:5181,<ZOOKEEPER_HOST_2>:5181"
    services.coordinator.master.embedded-zookeeper.enabled: false
    
  • OPTIONAL - Set an alternative client end port to avoid port collisions:

    services:{
    coordinator.client-endpoint.port:31050
    }
    

In dremio-env:

  • Specify the path for the MapR ticket if MapR cluster is secure:
    # For Secure Cluster
    export MAPR_TICKETFILE_LOCATION=<MAPR_TICKET_PATH>
    

Starting the Dremio Daemon

Once configuration is completed, you can start the Dremio Coordinator daemon with the command. Note that it has to be started with the user that either configured with service ticket or mapr user.

sudo service dremio start
# OR
sudo -u mapr /opt/dremio/bin/dremio --config /etc/dremio/ start

Completing Coordinator Setup

Open a browser and navigate to http://<COORDINATOR_NODE>:9047. UI flow will then walk you through creating the first admin user.

Deploy Dremio Executors on YARN

Once Dremio Coordinator is successfully deployed:

  1. Navigate to the UI > Admin > Provisioning section

  2. Select YARN, select MapR as your distribution and enter details. Dremio recommends having only one worker (YARN container) per node.

  3. Configure Resource Manager and CLDB. Resource Manager needs to be specified as a hostname or IP address (e.g. 192.168.0.1). CLDB defaults to maprfs:///, this only needs to be changed if connecting to a non-default MapR cluster.

  4. You can now monitor and manage YARN executor nodes:

Sample Configuration Files

Sample dremio.conf for a master-coordinator node:

paths: {
  # the local path for dremio to store data.
  local: "/var/lib/dremio"

  # the distributed path Dremio data including job results, downloads, uploads, etc
  dist: "maprfs:///dremio/pdfs"
}

zookeeper: "<MAPR_ZOOKEEPER1>:5181,<MAPR_ZOOKEEPER2>:5181"

services: {
  coordinator.enabled: true,
  coordinator.master.enabled: true,
  executor.enabled: false
}

Sample dremio-env for the coordinator node if MapR cluster is secure:

# For Secure Cluster
export MAPR_TICKETFILE_LOCATION=<MAPR_TICKET_PATH>

results matching ""

    No results matching ""