Skip to main content
Version: 24.3.x

MapR Deployment (YARN)

This topic describes how to deploy Dremio on MapR in YARN deployment mode.

Architecture

In YARN Deployment mode, Dremio integrates with YARN ResourceManager to secure compute resources in a shared multi-tenant environment. The integration enables enterprises to more easily deploy Dremio on a Hadoop cluster, including the ability to elastically expand and shrink the execution resources. The following diagram illustrates the high-level deployment architecture of Dremio on a MapR cluster.

Key components of the overall architecture:

  • Dremio Coordinator should be deployed on the edge node.

  • Dremio Coordinator is subsequently configured, via the Dremio UI, to launch Dremio Executors in YARN containers. The number of Executors and the resources allocated to them can be managed through the Dremio UI. See system requirements for resource needs of each node type.

  • It is recommended that a dedicated YARN queue be set up for the Dremio Executors in order to avoid resource conflicts.

  • Dremio Coordinators and Executors are configured to use MapR-FS volumes for the cache and spill directories.

  • Dremio implements a watchdog to watch Dremio processes and provides HTTP health checks to kill executor processes that do not shutdown cleanly.

Step 1: Verify MapR-specific Requirements

Please refer to System Requirements for base requirements. The following are additional requirements for YARN (MapR) deployments.

Permissions

  • Installing Dremio requires MapR administrative privileges. Dremio services running on MapR clusters should be run as the mapr user or a user account using a service with an impersonation ticket (see MapR 5.2.x, 6.1.x, or 6.2) and have read privileges for the MapR-FS directories/files that will either be queried directly or that map to the Hive Metastore.

  • Create a dedicated MapR volume and directory for the Dremio's distributed cache. Dremio user should have read and write permissions.

  • Optionally, create a dedicated YARN queue for Dremio executor nodes with job submission privileges for the Dremio user.

note

Be sure to run sudo -u mapr yarn rmadmin -refreshQueue for queue configuration changes to take effect.

Sample fair-scheduler.xml entry
<allocations>
<queue name="dremio">
<minResources>320000 mb,160 vcores,0 disks</minResources>
<maxResources>640000 mb,320 vcores,0 disks</maxResources>
<aclSubmitApps>mapr</aclSubmitApps>
</queue>
</allocations>

CPU Configuration

In order for the CPU configuration specified in Dremio to be used and enforced on the YARN side, you need to do the following:

  • Enable CPU scheduling in YARN.
  • Enable Linux CGroup enforcement in YARN.

Network Ports

PurposePortFromTo
ZooKeeper (External MapR)5181Dremio nodesZK
CLDB (MapR)7222CoordinatorsCLDB
DataNodes (MapR)5660Dremio nodesMapR data nodes
YARN ResourceManager (MapR)8032CoordinatorsYARN RM

Step 2: Install and Configure Dremio

This step involves installing and configuring Dremio on each node in your cluster.

Installing Dremio

Installation should be done as the mapr user and not as the dremio user. See Installing and Upgrading via RPM or Installing and Upgrading via Tarball for more information.

Configuring Dremio

note

When referring to a Dremio coordinator, the configuration is for a master-coordinator role.

Configuring Dremio via dremio.conf

The following properties must be reviewed and or modified.

  • Specify a master-coordinator role for the coordinator node:

    Specify master-coordinator role
    services: {
    coordinator.enabled: true,
    coordinator.master.enabled: true,
    executor.enabled: false
    }
  • Specify a local metadata location that only exists on the coordinator node:

    Specify local metadata location
    paths: {
    local: "/var/lib/dremio"
    ...
    }
  • Specify a distributed cache location for all nodes using the dedicated MapR volume that you created:

    Specify distributed cache location
    paths: {
    ...
    dist: "maprfs:///<MOUNT_PATH>/<CACHE_DIRECTORY>"
    }
  • Specify the MapR ZooKeeper for coordination:

    Specify MapR ZooKeeper
    zookeeper: "<ZOOKEEPER_HOST_1>:5181,<ZOOKEEPER_HOST_2>:5181"
    services.coordinator.master.embedded-zookeeper.enabled: false
  • OPTIONAL - Set an alternative client end port to avoid port collisions:

    Set alternative client end port (optional)
    services:{
    coordinator.client-endpoint.port:31050
    }

Configuring Dremio via dremio-env

Specify the path for the MapR ticket if MapR cluster is secure:

Specify path for MapR ticket
# For Secure Cluster
export MAPR_TICKETFILE_LOCATION=<MAPR_TICKET_PATH>

Starting the Dremio Daemon

Once configuration is completed, you can start the Dremio Coordinator daemon with the command. Note that it has to be started with the user that either configured with service ticket or mapr user.

Start Dremio Coordinator daemon
sudo service dremio start
# OR
sudo -u mapr /opt/dremio/bin/dremio --config /etc/dremio/ start

Accessing the Dremio UI

Open a browser and navigate to http://<COORDINATOR_NODE>:9047. The Dremio UI flow walks you through creating the first Admin user.

Step 3: Deploy Dremio Executors on YARN

After you deploy the Dremio Coordinator, follow these steps to deploy Dremio executors:

  1. Navigate to the Set Up YARN window by following either of these sets of steps:

    • If your version of Dremio displays a link labeled Admin in the top-right corner, follow these steps:

      a. Click Admin in the top-right corner of the screen.

      b. In the left panel, select Provisioning.

      c. Select YARN, select MapR as your distribution.

    • If your version of Dremio displays a gear icon in a sidebar on the left side of the screen, follow these steps:

      a. Click the gear icon.

      b. In the Engines section of the left panel, select Elastic Engines.

      c. In the upper-right corner, click Add Engine.

      d. In the Set Up YARN window, select MapR in the Hadoop Engine field.

  2. Enter details. Dremio recommends having only one worker (YARN container) per node.

  3. In the Resource Manager field, follow either of these steps:

    • If Resource Manager HA is not enabled, specify the hostname or IP address of the resource manager.
    • If Resource Manager HA is enabled, specify the value of the property yarn.resourcemanager.cluster-id, which is in the file yarn-site.xml.
  4. In the CLDB field, accept the default of maprfs:///.

You can now monitor and manage YARN executor nodes.

Sample Configuration Files

Sample dremio.conf file for a coordinator node
paths: {
# the local path for dremio to store data.
local: "/var/lib/dremio"

# the distributed path Dremio data including job results, downloads, uploads, etc
dist: "maprfs:///dremio/pdfs"
}

zookeeper: "<MAPR_ZOOKEEPER1>:5181,<MAPR_ZOOKEEPER2>:5181"

services: {
coordinator.enabled: true,
coordinator.master.enabled: true,
executor.enabled: false
}
Sample dremio-env for the coordinator node if MapR cluster is secure
# For Secure Cluster
export MAPR_TICKETFILE_LOCATION=<MAPR_TICKET_PATH>