Connecting to Your Data
This section describes the data sources that you can configure and analyze using Dremio Cloud, including data lakes (distributed filesystems) and relational databases (external sources).
Dremio Cloud does not support case-sensitive data file names, table names, or column names.
For example, if you have three file names that have the same name, but with different cases (such as,
market), Dremio Cloud is unable to discern the case differences, resulting in unanticipated data results.
For column names, if two columns have the same name using different cases (such as
trip_pickup_datetime) exist in the table, one of the columns may disappear when the header is extracted.
You can add an Arctic catalog as a source to enable Git-like data management and allow data engineers to manage the data lake with the same best practices Git enables for software development, including commits, tags, and branches.
The AWS Glue Catalog is a metadata store that lets you store and share metadata in AWS.
You can run queries directly on the data in your data lake by formatting directories and files into tables. The following types of object storage are supported:
Dremio Software Clusters
You can connect to one or more other Dremio Software clusters and run queries on the data sources that they are connected to. You can even run queries that federate data across connected clusters. See Connecting to Another Dremio Software Cluster.
Relational Databases (External Sources)
You can run queries directly on the data in relational databases, which are referred to as external sources. In addition, you can run external queries:
That use the native syntax of the relational database.
To process SQL statements that are not supported by Dremio Cloud or are too complex to convert.note
Decimal-to-decimal mappings are supported for relational database sources.
The following database sources are supported: