MapR-FS

Setup and Best Practices

Container Location Databases (CLDBs)

When adding a MapR-FS data source, be sure to list each node that runs a CLDB in your cluster. This will allow Dremio to continue to query the source in the event of a CLDB node failure.

Colocation

For all but the most robust network hardware, colocating Dremio nodes with MapR-FS datanodes can lead to noticeably reduced data transfer times and more performant query execution.

Parquet File Performance

When HDFS data is stored in the Parquet file format, then optimal performance is achieved by storing one Parquet row group per file, with a file size less than or equal to the MapR-FS chunk size. Parquet files that overrun the MapR-FS chunk size can negatively impact query times by incurring a considerable amount of filesystem overhead.

NOTE: Ensure that your Dremio cluster has access to the appropriate ports for each node of your MapR-FS source. By default this should be port 7222 for CLDB processes (which should be the one specified when adding the CLDBs of the cluster in the source dialog), as well as ports 5660 and 6660 which are used for internal purposes.

MapR Cluster Names

Dremio does not support MapR cluster names that are non-URI qualified (e.g. containing "_" character). Instead users should use an alias. This alias has to be added to mapr-clusters.conf on all the nodes of the cluster.

Here is a sample entry and command to generate a maprticket for a given alias:

mycluster_test secure=true 123.0.0.1:7222
bestcluster secure=true 123.0.0.2:7222

maprlogin password -cluster bestcluster

Dremio and MapR-FS

Impersonation and Ownership Chaining

You can enable flexible control over file permissions by turning on impersonation in MapR-FS sources (check the 'impersonation' box in the source connection dialog). This means that users who access data stored on this source will have their access mediated by the MapR-FS privileges associated with their Dremio login name, rather than the ones associated with the Dremio daemon.

Enabling impersonation also permits a kind of behavior called 'ownership chaining.' Under ownership chaining, MapR-FS data which is subject to restricted access can be shared with any other Dremio users via the creation of a virtual dataset in a public (non-Home) space.

Dremio Configuration

Here are all available source specific options:

Name Description
Cluster Name MapR Cluster name.
Impersonation Enable impersonation.
Encrypt Connection Whether the cluster is secure or not.
Root Path Root path for the MapR-FS source.
Properties A list of additional MapR-FS connection properties.

results matching ""

    No results matching ""