MapR Deployment (YARN)

This topic describes how to deploy Dremio on MapR in YARN deployment mode.

Architecture

In YARN Deployment mode, Dremio integrates with YARN ResourceManager to secure compute resources in a shared multi-tenant environment. The integration enables enterprises to more easily deploy Dremio on a Hadoop cluster, including the ability to elastically expand and shrink the execution resources. The following diagram illustrates the high-level deployment architecture of Dremio on a MapR cluster.

Key components of the overall architecture:

  • Dremio Coordinator should be deployed on the edge node.
  • Dremio Coordinator is subsequently configured, via the Dremio UI, to launch Dremio Executors in YARN containers. The number of Executors and the resources allocated to them can be managed through the Dremio UI. See system requirements for resource needs of each node type.
  • It is recommended that a dedicated YARN queue be set up for the Dremio Executors in order to avoid resource conflicts.
  • Dremio Coordinators and Executors are configured to use MapR-FS volumes for the cache and spill directories.
  • Dremio implements a watchdog to watch Dremio processes and provides HTTP health checks to kill executor processes that do not shutdown cleanly.

Step 1: Verify MapR-specific Requirements

Please refer to System Requirements for base requirements. The following are additional requirements for YARN (MapR) deployments.

Permissions

  • Installing Dremio requires MapR administrative privileges. Dremio services running on MapR clusters should be run as the mapr user or a user account using a service with an impersonation ticket and have read privileges for the MapR-FS directories/files that will either be queried directly or that map to the Hive Metastore.

  • Create a dedicated MapR volume and directory for the Dremio's distributed cache. Dremio user should have read and write permissions.

  • Optionally, create a dedicated YARN queue for Dremio executor nodes with job submission privileges for the Dremio user. Note: Be sure to run sudo -u mapr yarn rmadmin -refreshQueue for queue configuration changes to take affect. Here is a sample fair-scheduler.xml entry:

    <allocations>
      <queue name="dremio">
        <minResources>320000 mb,160 vcores,0 disks</minResources>
        <maxResources>640000 mb,320 vcores,0 disks</maxResources>
        <aclSubmitApps>mapr</aclSubmitApps>
      </queue>
    </allocations>
    

CPU Configuration

In order for the CPU configuration specified in Dremio to be used and enforced on the YARN side, you need to do the following:

  • Enable CPU scheduling in YARN.
  • Enable Linux CGroup enforcement in YARN.

Network Ports

Purpose Port From To
ZooKeeper (External MapR) 5181 Dremio nodes ZK
CLDB (MapR) 7222 Coordinators CLDB
DataNodes (MapR) 5660 Dremio nodes MapR data nodes
YARN ResourceManager (MapR) 8032 Coordinators YARN RM

Step 2: Install and Configure Dremio

This step involved installing and configuring Dremio on each node in your cluster.

Installing Dremio

Installation should be done as the mapr user and not as the dremio user. See Installing and Upgrading via RPM or Installing and Upgrading via Tarball for more information.

Configuring Dremio

[info] Note: When referring to a Dremio coordinator, the configuration is for a master-coordinator role.

Configuring Dremio via dremio.conf

The following properties must be reviewed and or modified.

  • Specify a master-coordinator role for the coordinator node:

    services: {
    coordinator.enabled: true,
    coordinator.master.enabled: true,
    executor.enabled: false
    }
    
  • Specify a local metadata location that only exists on the coordinator node:

    paths: {
    local: "/var/lib/dremio" 
    ...
    }
    
  • Specify a distributed cache location for all nodes using the dedicated MapR volume that you created:

    paths: {
    ...
    dist: "maprfs:///<MOUNT_PATH>/<CACHE_DIRECTORY>"
    }
    
  • Specify the MapR ZooKeeper for coordination:

    zookeeper: "<ZOOKEEPER_HOST_1>:5181,<ZOOKEEPER_HOST_2>:5181"
    services.coordinator.master.embedded-zookeeper.enabled: false
    
  • OPTIONAL - Set an alternative client end port to avoid port collisions:

    services:{
    coordinator.client-endpoint.port:31050
    }
    

Configuring Dremio via dremio-env

Specify the path for the MapR ticket if MapR cluster is secure:

# For Secure Cluster
export MAPR_TICKETFILE_LOCATION=<MAPR_TICKET_PATH>

Starting the Dremio Daemon

Once configuration is completed, you can start the Dremio Coordinator daemon with the command. Note that it has to be started with the user that either configured with service ticket or mapr user.

sudo service dremio start
# OR
sudo -u mapr /opt/dremio/bin/dremio --config /etc/dremio/ start

Accessing the Dremio UI

Open a browser and navigate to http://<COORDINATOR_NODE>:9047. The Dremio UI flow walks you through creating the first Admin user.

Step 3: Deploy Dremio Executors on YARN

Once the Dremio Coordinator is successfully deployed:

  1. Navigate to the UI > Admin > Provisioning section

  2. Select YARN, select MapR as your distribution and enter details. Dremio recommends having only one worker (YARN container) per node.

  3. Configure Resource Manager and CLDB. Resource Manager needs to be specified as a hostname or IP address (e.g. 192.168.0.1). CLDB defaults to maprfs:///, this only needs to be changed if connecting to a non-default MapR cluster.

  4. You can now monitor and manage YARN executor nodes:

Sample Configuration Files

Sample dremio.conf for a coordinator node:

paths: {
  # the local path for dremio to store data.
  local: "/var/lib/dremio"

  # the distributed path Dremio data including job results, downloads, uploads, etc
  dist: "maprfs:///dremio/pdfs"
}

zookeeper: "<MAPR_ZOOKEEPER1>:5181,<MAPR_ZOOKEEPER2>:5181"

services: {
  coordinator.enabled: true,
  coordinator.master.enabled: true,
  executor.enabled: false
}

Sample dremio-env for the coordinator node if MapR cluster is secure:

# For Secure Cluster
export MAPR_TICKETFILE_LOCATION=<MAPR_TICKET_PATH>

results matching ""

    No results matching ""