MongoDB
Supported Versions
Dremio supports MongoDB 3.6 through 8.0.
Connect to MongoDB
- In the Dremio console, click Add Data on the Home page.
- In the Add Data dialog, select MongoDB.
- Configure the connection using the sections below, then click Save.
General
Under Name, enter the name of the connection. The name cannot include the following special characters: /, :, [, or ].
Connection
| Name | Description |
|---|---|
| Hosts | A list of Mongo hosts. If MongoDB is sharded, enter the mongos hosts. Otherwise, enter the mongod host. |
| Port | A list of Mongo port numbers. Defaults to 27017. |
- Connection Scheme -- Select how to connect to MongoDB.
- Encrypt connection -- Forces an encrypted connection over SSL.
- Read from secondaries only -- Disables reading from primaries. Might degrade performance.
Authentication
- No authentication method
- Master Authentication method (default)
- Username -- MongoDB username
- Password -- MongoDB password
- Authentication database -- Database to authenticate against.
Advanced Options
- Subpartition Size -- Number of records to be read by query fragments. This option can be used to increase query parallelism.
- Sample Size -- Number of records to be read when sampling to determine the schema for a collection. If zero, the sample size is unlimited.
- Sample Method -- The method (First or Last) by which records should be read when sampling a collection to determine the schema.
- Auth Timeout (millis) -- Authentication timeout in milliseconds.
- Field names are case insensitive -- When enabled, Dremio reads all known variations of a field name when determining the schema, ignoring any value set for Sample Size. All field name variations are then used when pushing an operation down to MongoDB.
- Connection Properties -- A list of additional MongoDB connection parameters.
Reflection Refresh
- Never refresh -- Specifies how often to refresh based on hours, days, weeks, or never.
- Never expire -- Specifies how often to expire based on hours, days, weeks, or never.
Metadata
Dataset Handling
- Remove dataset definitions if underlying data is unavailable (Default).
If this box is not checked and the underlying files under a folder are removed or the data is not accessible, Dremio does not remove the dataset definitions. This option is useful in cases when files are temporarily deleted and put back in place with new sets of files.
Metadata Refresh
- Dataset Discovery -- Refresh interval for top-level object names such as names of databases and tables.
- Fetch every -- Specify fetch time based on minutes, hours, days, or weeks. Default: 1 hour
- Dataset Details -- The metadata that Dremio needs for query planning such as information needed for fields, types, shards, statistics, and locality.
- Fetch mode -- Specify either Only Queried Datasets, All Datasets, or As Needed. Default: Only Queried Datasets
- Only Queried Datasets -- Dremio updates details for previously queried objects. This mode increases query performance because less work is needed at query time for these datasets.
- All Datasets -- Dremio updates details for all datasets. This mode increases query performance because less work is needed at query time.
- As Needed -- Dremio updates details for a dataset at query time. This mode minimized metadata queries when not used, but might lead to longer planning times.
- Fetch every -- Specify fetch time based on minutes, hours, days, or weeks. Default: 1 hour
- Expire after -- Specify expiration time based on minutes, hours, days, or weeks. Default: 3 hours
- Fetch mode -- Specify either Only Queried Datasets, All Datasets, or As Needed. Default: Only Queried Datasets
Privileges
This connection inherits privileges from Project settings. To grant specific users or roles additional privileges in this connection:
- Enter the username or role name that you want to grant access to and click the Add to Privileges button. The added user or role is displayed in the USERS/ROLES table.
- For the users or roles in the USERS/ROLES table, toggle the checkmark for each privilege you want to grant on the Dremio source that is being created.
- Click Save after setting the configuration.
See Privileges for additional information about privileges.
Edit a MongoDB Connection
- On the Open Catalog page, under Connections, right-click the connection and select Settings.
- Update the connection configuration as needed.
- Click Save.
Delete a MongoDB Connection
- On the Open Catalog page, under Connections, right-click the connection and select Delete.
- Click Delete to confirm.
Predicate Pushdowns
Dremio offloads these operations to MongoDB:
ABS
ADD
AND
CASE
CEIL
CONCAT
DAY_OF_MONTH
DIVIDE
EQUAL
EXP
FLOOR
GREATER
GREATER_OR_EQUAL
HOUR
LESS
LESS_OR_EQUAL
LN
LOG
LOG10
MAX
MIN
MINUTE
MOD
MONTH
MULTIPLY
NOT
NOT_EQUAL
OR
POW
REGEX
SECOND
SQRT
SUBSTR
SUBTRACT
TO_LOWER
TO_UPPER
TRUNC
YEAR
For More Information
For information about Dremio data types, see Data Types.
Limitations
Queries that unnest nested fields are not allowed as they would cause incorrect schemas. This may be easily circumvented by pushing filters into the subquery or by simply not referencing the alias.
Data Type Map
Dremio supports selecting the following MongoDB Database types. The following table shows the mappings from MongoDB to Dremio data types. If there are additional MongoDB types not listed in the table, then those types are not supported in Dremio.
| MongoDB Database Type | Dremio Type |
|---|---|
| ARRAY | LIST |
| BINDATA | VARBINARY |
| BOOL | BOOLEAN |
| DATE | TIMESTAMP |
| DBPOINTER | { "namespace": VARCHAR, "id": VARBINARY } |
| DOUBLE | DOUBLE |
| INT | INTEGER (or DOUBLE if store.mongo.read_numbers_as_double set) |
| JAVASCRIPT | VARCHAR |
| JAVASCRIPTWITHSCOPE | { "code": VARCHAR, "scope": { ... } } |
| LONG | BIGINT (or DOUBLE if store.mongo.read_numbers_as_double set) |
| OBJECT | STRUCT |
| OBJECTID | VARBINARY |
| REGEX | { "pattern": VARCHAR, "options": VARCHAR } |
| STRING | VARCHAR |
| SYMBOL | VARCHAR |
| TIMESTAMP | TIMESTAMP |