This topic provides information for configuring the Amazon S3 data source.
You can query files and directories stored in your S3 buckets. Dremio supports a number of different file formats. See Files and Directories for more information.
Amazon configuration involves:
To list your AWS account’s S3 buckets as a source, you must provide your AWS credentials in the form of your access and secret keys. You can find instructions for creating these keys in Amazon’s Access Key documentation.
Note: AWS credentials are not necessary if you are accessing only public S3 buckets.
The following sample IAM Policy show the minimum policy requirements that allows Dremio to read and query S3.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1554422960000",
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:ListAllMyBuckets"
],
"Resource": [
"arn:aws:s3:::*"
]
},
{
"Sid": "Stmt1554423012000",
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::BUCKET-NAME"
]
},
{
"Sid": "Stmt1554423050000",
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::BUCKET-NAME/*"
]
}
]
}
The following sample IAM Policy shows the minimum policy requirements that allows Dremio to write to S3.
For example, to store reflections on S3.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::BUCKET-NAME",
"arn:aws:s3:::BUCKET-NAME/*"
]
},
{
"Sid": "VisualEditor2",
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets",
"s3:HeadBucket"
],
"Resource": "*"
}
]
}
Advanced options include:
WARNING
If your S3 datasets include large Parquet files with 100 or more columns, then you will need to edit the number of maximum connections to S3 that each processing unit of Dremio is allowed to spawn. This can be done by adding a connection property called
fs.s3a.connection.maximum
and a custom value greater than the default 100.
Optionally, you can configure your S3 source to connect through a proxy.
You can achieve this by adding the following Properties
in the settings for your S3 source:
Property Name | Description |
---|---|
fs.s3a.proxy.host | Proxy host. |
fs.s3a.proxy.port | Proxy port number. |
fs.s3a.proxy.username | Username for authenticated connections, optional. |
fs.s3a.proxy.password | Password for authenticated connections, optional. |
To connect to a bucket in AWS GovCloud, set the correct GovCloud endpoint for your S3 source.
You can achieve this by adding the following Properties
in the settings:
Property Name | Description |
---|---|
fs.s3a.endpoint | e.g. s3-us-gov-west-1.amazonaws.com |
You can specify which users can edit. Options include:
As of Dremio 3.2.3, Minio is offered as an experimental S3-compatible plugin.
To configure your S3 source for Minio in the Dremio UI:
fs.s3a.path.style.access
and set the value to true
. fs.s3a.endpoint
property and its
corresponding server endpoint value (IP address).http(s)://
prefix.
For example, if the endpoint is http://123.1.2.3:9000
, the value is 123.1.2.3:9000
.To configure your S3 source for Minio with an encrypted connection enabled:
./minio server [data folder] --certs-dir [certs directory]
.<JAVA_HOME>/keytool -import -v -trustcacerts -alias alias -file cert-file -keystore cacerts -keypass changeit -storepass changeit
Note: Replace alias
with the alias name you want and replace cert-file
with the absolute path of the certificate file used to startup Minio server.fs.s3a.path.style.access
and set the value to true
. fs.s3a.endpoint
property and its
corresponding server endpoint value (IP address).http(s)://
prefix.
For example, if the endpoint is http://123.1.2.3:9000
, the value is 123.1.2.3:9000
.Minio can be be used as a distributed store. Note that Minio works as a distributed store for both SSL and unencrypted connections. See Configuring Distributed Storage for more information.
As of Dremio 4.0 Enterprise Edition, cloud caching is available. See Configuring Cloud Cache for more information.
As of Dremio 4.0, AWS Key Managment Service (KMS) is available for S3 distributed store. See Configuring Distributed Storage for more information.
See the following Minio documentation for more information: