Amazon S3

Working with files stored in S3

You can query files and directories stored in your S3 buckets. Dremio supports a number of different file formats. To learn more, see the chapter on Files and Directories.

Amazon Configuration

Amazon S3 Credentials

To list your AWS account's S3 buckets as a source, you must provide your AWS credentials in the form of your access and secret keys. You can find instructions for creating these keys in Amazon's documentation.

Sample IAM Policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::TestBucket-Dremio"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::TestBucket-Dremio/*"
            ]
        }
    ]
}

NOTE: AWS credentials are not necessary if you are accessing only public S3 buckets.

Dremio Configuration

Here are all available source specific options:

Name Description
AWS Access Key AWS access key.
AWS Access Secret AWS access secret.
Enable SSL Encryption Whether to enable secure connections.
External Buckets A list of external buckets that are not included with the provided AWS account credentials.
Properties A list of additional Amazon S3 connection properties.

WARNING: If your S3 datasets include large Parquet files with 100 or more columns, then you will need to edit the number of maximum connections to S3 that each processing unit of Dremio is allowed to spawn. This can be done by adding a connection property called fs.s3a.connection.maximum and a custom value greater than the default 100.

Connecting through a proxy server

Optionally, you can configure your S3 source to connect through a proxy. You can achieve this by adding the following Properties in the settings for your S3 source:

Property Name Description
fs.s3a.proxy.host Proxy host.
fs.s3a.proxy.port Proxy port number.
fs.s3a.proxy.username Username for authenticated connections, optional.
fs.s3a.proxy.password Password for authenticated connections, optional.

Connecting to a bucket in AWS GovCloud

To connect to a bucket in AWS GovCloud, set the correct GovCloud endpoint for your S3 source. You can achieve this by adding the following Properties in the settings:

Property Name Description
fs.s3a.endpoint e.g. s3-us-gov-west-1.amazonaws.com

results matching ""

    No results matching ""