Querying External Data Sources
Dremio enables users to run external queries, queries that use the native syntax of the relational database, to process SQL statements that are not yet supported by Dremio or are too complex to convert. Dremio supports external queries for the following relational databases:
- Amazon Redshift
- Elasticsearch
- IBM Db2
- Microsoft Azure Data Explorer
- Microsoft Azure Synapse Analytics
- Microsoft SQL Server
- MySQL
- Oracle
- PostgreSQL
- Snowflake
- Teradata
- Dremio Hub connectors that use ARP, Advanced Relational Pushdown.
Note
Dremio supports onlySELECT
SQL statements in external queries. Dremio does not support batchedSELECT
statements that return multiple result sets.
Enabling Users to Perform External Queries (Dremio 16.0+)
To perform external queries, users need the EXTERNAL_QUERY
privilege, which appears as External Query
on the Privileges screen. Administrators must manually grant each user this privilege at the System or Source level.
With this privilege, users can access all types of data assets that might exist in external sources, including databases and tables. Therefore, Dremio recommends that you restrict access to external sources. You can do so by, for example, granting only read-only access to datasets.
Enabling Users to Perform External Queries (Dremio 15.X and earlier)
A Dremio administrator must enable external queries for a relational database data source. Dremio recommends configuring access to the source using a database user with read-only access to the relational databases. Once enabled, Dremio users with edit access to the source also have access to all derived database assets, such as databases and tables.
Warning
Enabling this feature allows users to bypass dataset and table-specific permissions in Dremio.
You must have the following prerequisites before enabling external queries:
- A relational database data source in your Dremio deployment
- Credentials of a database user with read-only access to the relational databases in the data source
To enable external queries:
- Log in to Dremio as an administrator.
- On the
Datasets
page, clickExternal Sources
. - Click the relational database data source for which you want to enable external queries.
- Click the gear icon, to the right of the search box, then click
Advanced Options
in theEdit Source
modal. - Check
Grant External Query access
. - Click
Save
.
Note:
Dremio clears permissions, formats, and data reflections for all datasets created from external queries when you update the metadata for the data source.
Syntax
External queries use the following syntax:
table(<datasource-name>.external_query('<source-query'))
Dremio treats the source query as a string literal that must be surrounded by single quotation marks, '
. If an external query contains a string literal inside the source query, Dremio users must use double single quotes as shown in the following example:
SELECT * FROM table(postgresql.external_query('SELECT string_col FROM tbl WHERE string_col = *''*test*''*`
To escape a single quotation mark, use four consecutive quotation marks, as shown in the following example:
External query example with single quotation mark escapeSELECT * FROM table(postgresql.external_query('SELECT string_col from tbl where string_col = '''john''''s car''))
Example: Basic Query and Creating a Dremio Relational Database Data Source
The following example selects the entire table named Actor
from the data source named PostgreSQL
:
SELECT * FROM table(PostgreSQL.external_query('SELECT * FROM Actor'))
- Log in to Dremio.
- On the
Datasets
page, clickExternal Sources
. - Click the relational database data source that you want to query using native syntax.
- Enter your external query in the SQL Editor.
- Click
Run
.
Dremio users and developers can then save the results of their external queries to a virtual data source by clicking the disk image icon, then selecting Save As...
.
Important
Dremio does not support moving virtual data sources created from external queries until after Dremio performs the first refresh of the data source.
Example: Source Query Joining Tables
The following source query demontrates how to perform a JOIN
between a Dremio physical data source and a table in a virtual data source created from the results of an external query:
SELECT b.customer_id, a.product_id, a.price
FROM table(postgresql.external_query( source_a,
'SELECT product_id, price
FROM products' )) AS a,
source_b.sales AS b
WHERE b.product_id = a.product_id'