Connecting to Your Data
This section describes the data sources that you can configure and analyze using Dremio Cloud, including data lakes (distributed filesystems) and relational databases (external sources).
Dremio Cloud does not support case-sensitive data file names, table names, or column names.
For example, if you have three file names that have the same name, but with different cases (such as, MARKET
, Market
, and market
), Dremio Cloud is unable to discern the case differences, resulting in unanticipated data results.
For column names, if two columns have the same name using different cases (such as Trip_Pickup_DateTime
and trip_pickup_datetime
) exist in the table, one of the columns may disappear when the header is extracted.
Data Source Support Matrix
The following sources are supported across AWS and Azure Dremio Cloud Projects:
Source Type | Source Name | AWS Supported? | Azure Supported? |
---|---|---|---|
Data-as-code | Dremio Arctic | Yes | Yes |
Metastore | AWS Glue | Yes | No |
Object Storage | Amazon S3 | Yes | Yes |
Azure Storage | Yes | Yes | |
Dremio | Dremio-to-Dremio | Yes | Yes |
Relational Database | Amazon Redshift | Yes | No |
Apache Druid | Yes | No | |
Microsoft SQL Server | Yes | Yes | |
Oracle | Yes | Yes | |
PostgreSQL | Yes | Yes | |
Snowflake | Yes | Yes | |
IBM Db2 | Yes | Yes |
Data-as-code
You can add an Arctic catalog as a source to enable Git-like data management and allow data engineers to manage the data lake with the same best practices Git enables for software development, including commits, tags, and branches.
Metastores
The AWS Glue Catalog is a metadata store that lets you store and share metadata in AWS.
Object Storage
You can run queries directly on the data in your data lake by formatting directories and files into tables. The following types of object storage are supported:
Dremio Software Clusters
You can connect to one or more other Dremio Software clusters and run queries on the data sources that they are connected to. You can even run queries that federate data across connected clusters. See Connecting to Another Dremio Software Cluster.
Relational Databases (External Sources)
You can run queries directly on the data in relational databases, which are referred to as external sources. In addition, you can run external queries:
-
That use the native syntax of the relational database.
-
To process SQL statements that are not supported by Dremio Cloud or are too complex to convert.
noteDecimal-to-decimal mappings are supported for relational database sources.
The following database sources are supported: