On this page

    MapR Deployment (YARN)

    This topic describes how to deploy Dremio on MapR in YARN deployment mode.


    In YARN Deployment mode, Dremio integrates with YARN ResourceManager to secure compute resources in a shared multi-tenant environment. The integration enables enterprises to more easily deploy Dremio on a Hadoop cluster, including the ability to elastically expand and shrink the execution resources. The following diagram illustrates the high-level deployment architecture of Dremio on a MapR cluster.

    Key components of the overall architecture:

    • Dremio Coordinator should be deployed on the edge node.
    • Dremio Coordinator is subsequently configured, via the Dremio UI, to launch Dremio Executors in YARN containers. The number of Executors and the resources allocated to them can be managed through the Dremio UI. See system requirements for resource needs of each node type.
    • It is recommended that a dedicated YARN queue be set up for the Dremio Executors in order to avoid resource conflicts.
    • Dremio Coordinators and Executors are configured to use MapR-FS volumes for the cache and spill directories.
    • Dremio implements a watchdog to watch Dremio processes and provides HTTP health checks to kill executor processes that do not shutdown cleanly.

    Step 1: Verify MapR-specific Requirements

    Please refer to System Requirements for base requirements. The following are additional requirements for YARN (MapR) deployments.


    • Installing Dremio requires MapR administrative privileges. Dremio services running on MapR clusters should be run as the mapr user or a user account using a service with an impersonation ticket (see MapR 5.2.x, 6.1.x, or 6.2) and have read privileges for the MapR-FS directories/files that will either be queried directly or that map to the Hive Metastore.

    • Create a dedicated MapR volume and directory for the Dremio’s distributed cache. Dremio user should have read and write permissions.

    • Optionally, create a dedicated YARN queue for Dremio executor nodes with job submission privileges for the Dremio user. Note: Be sure to run sudo -u mapr yarn rmadmin -refreshQueue for queue configuration changes to take affect.

    Sample fair-scheduler.xml entry
        <queue name="dremio">
          <minResources>320000 mb,160 vcores,0 disks</minResources>
          <maxResources>640000 mb,320 vcores,0 disks</maxResources>

    CPU Configuration

    In order for the CPU configuration specified in Dremio to be used and enforced on the YARN side, you need to do the following:

    • Enable CPU scheduling in YARN.
    • Enable Linux CGroup enforcement in YARN.

    Network Ports

    Purpose Port From To
    ZooKeeper (External MapR) 5181 Dremio nodes ZK
    CLDB (MapR) 7222 Coordinators CLDB
    DataNodes (MapR) 5660 Dremio nodes MapR data nodes
    YARN ResourceManager (MapR) 8032 Coordinators YARN RM

    Step 2: Install and Configure Dremio

    This step involves installing and configuring Dremio on each node in your cluster.

    Installing Dremio

    Installation should be done as the mapr user and not as the dremio user. See Installing and Upgrading via RPM or Installing and Upgrading via Tarball for more information.

    Configuring Dremio


    When referring to a Dremio coordinator, the configuration is for a master-coordinator role.

    Configuring Dremio via dremio.conf

    The following properties must be reviewed and or modified.

    • Specify a master-coordinator role for the coordinator node:

      Specify master-coordinator role
      services: {
        coordinator.enabled: true,
        coordinator.master.enabled: true,
        executor.enabled: false
    • Specify a local metadata location that only exists on the coordinator node:

      Specify local metadata location
      paths: {
        local: "/var/lib/dremio" 
    • Specify a distributed cache location for all nodes using the dedicated MapR volume that you created:

      Specify distributed cache location
      paths: {
        dist: "maprfs:///<MOUNT_PATH>/<CACHE_DIRECTORY>"
    • Specify the MapR ZooKeeper for coordination:

      Specify MapR ZooKeeper
      zookeeper: "<ZOOKEEPER_HOST_1>:5181,<ZOOKEEPER_HOST_2>:5181"
      services.coordinator.master.embedded-zookeeper.enabled: false
    • OPTIONAL - Set an alternative client end port to avoid port collisions:

      Set alternative client end port (optional)

    Configuring Dremio via dremio-env

    Specify the path for the MapR ticket if MapR cluster is secure:

    Specify path for MapR ticket
    # For Secure Cluster

    Starting the Dremio Daemon

    Once configuration is completed, you can start the Dremio Coordinator daemon with the command. Note that it has to be started with the user that either configured with service ticket or mapr user.

    Start Dremio Coordinator daemon
    sudo service dremio start
    # OR
    sudo -u mapr /opt/dremio/bin/dremio --config /etc/dremio/ start

    Accessing the Dremio UI

    Open a browser and navigate to http://<COORDINATOR_NODE>:9047. The Dremio UI flow walks you through creating the first Admin user.

    Step 3: Deploy Dremio Executors on YARN

    After you deploy the Dremio Coordinator, follow these steps to deploy Dremio executors:

    1. Navigate to the Set Up YARN window by following either of these sets of steps:

      • If your version of Dremio displays a link labeled Admin in the top-right corner, follow these steps:

        a. Click Admin in the top-right corner of the screen.

        b. In the left panel, select Provisioning.

        c. Select YARN, select MapR as your distribution.

      • If your version of Dremio displays a gear icon in a sidebar on the left side of the screen, follow these steps:

        a. Click the gear icon.

        b. In the Engines section of the left panel, select Elastic Engines.

        c. In the upper-right corner, click Add Engine.

        d. In the Set Up YARN window, select MapR in the Hadoop Engine field.

    2. Enter details. Dremio recommends having only one worker (YARN container) per node.

    3. In the Resource Manager field, follow either of these steps:

      • If Resource Manager HA is not enabled, specify the hostname or IP address of the resource manager.
      • If Resource Manager HA is enabled, specify the value of the property yarn.resourcemanager.cluster-id, which is in the file yarn-site.xml.
    4. In the CLDB field, accept the default of maprfs:///.

    You can now monitor and manage YARN executor nodes.

    Sample Configuration Files

    Sample dremio.conf file for a coordinator node
    paths: {
      # the local path for dremio to store data.
      local: "/var/lib/dremio"
      # the distributed path Dremio data including job results, downloads, uploads, etc
      dist: "maprfs:///dremio/pdfs"
    zookeeper: "<MAPR_ZOOKEEPER1>:5181,<MAPR_ZOOKEEPER2>:5181"
    services: {
      coordinator.enabled: true,
      coordinator.master.enabled: true,
      executor.enabled: false
    Sample dremio-env for the coordinator node if MapR cluster is secure
    # For Secure Cluster