Oracle

Setup and Best Practices

Connection Information

In addition to a hostname or IP and port, you will also need to know the site identifier (SID) of the Oracle server.

Resource Allocation Considerations

For RDBMS sources like Oracle, Dremio's query execution is largely single threaded. This means that for each Oracle directed query, only one Dremio node will experience a computational load. So unlike most other data sources, larger Dremio clusters won't lead to faster individual query execution times. However, if you expect a large number of concurrent queries (as in the case of many simultaneous Dremio users) these will be distributed evenly across the nodes.

NOTE: Ensure that your Dremio cluster has access to the appropriate port for your Oracle source. By default this is port 1521.

Dremio and Oracle

Query Pushdowns

For some operations Dremio can tell the data source to execute that part of the query natively, often dramatically improving performance. These operations are called 'pushdowns.'

Since they share a common language (SQL), Dremio supports most operations as pushdowns in Oracle. These include:

  • Filter (SQL: WHERE)
  • Limit (SQL: LIMIT)
  • Sorting (SQL: ORDER BY)
  • Aggregation (SQL: GROUP BY)
  • Project (with expressions) (e.g. SQL: SELECT columnA + columnB, columnC, columnD)

[info] Note

Since Oracle has no boolean type, project operations that contain SQL expressions which evaluate to true or false (e.g. SELECT username, friends > 0), and filter operations that include boolean literals in a filter (e.g. WHERE currentAccount = true) cannot be executed as pushdowns.

Initial Connection

Depending on the number of tables in your Oracle source, the final step of adding it to Dremio can take anywhere from a few seconds to a few minutes as the source's metadata is processed. However, this is a one-time cost and further queries to the source will not incur additional metadata reads.

Dremio Configuration

General

Host

Name Description
Host Oracle host name.
Port Oracle port number. Defaults to 1521.
Service Name Service Name of your database.

Authentication

  • No authentication method
  • Master Authentication method (default)
    • Username -- Oracle user name
    • Password -- Oracle password
  • Encrypt connection -- Enables secure connections.

Advanced Options

  • Use timezone as connection region -- If checked, uses timezone to set connection region.
  • Include synonyms -- If checked, includes synonyms as datasets.
  • Record fetch size -- Number of records to fetch at once. Set to 0 (zero) to have Dremio automatically decide. Default: 10
  • Enable legacy dialect
  • Encryption -- Provide the SSL/TLS server certificate distinguished name, otherwise, leave blank to diable the DN match.

Reflection Refresh

  • Never refresh -- Specifies how often to refresh based on hours, days, weeks, or never.
  • Never expire -- Specifies how often to expire based on hours, days, weeks, or never.

    Metadata

Dataset Handling

  • Remove dataset definitions if underlying data is unavailable (Default).
    If this box is not checked and the underlying files under a folder are removed or the folder/source is not accessible, Dremio does not remove the dataset definitions. This option is useful in cases when files are temporarily deleted and put back in place with new sets of files.

Metadata Refresh

  • Dataset Discovery -- Refresh interval for top-level source object names such as names of DBs and tables.
    • Fetch every -- Specify fetch time based on minutes, hours, days, or weeks. Default: 1 hour
  • Dataset Details -- The metadata that Dremio needs for query planning such as information needed for fields, types, shards, statistics, and locality.
    • Fetch mode -- Specify either Only Queried Datasets, All Datasets, or As Needed. Default: Only Queried Datasets
      • Only Queried Datasets -- Dremio updates details for previously queried objects in a source.
        This mode increases query performance because less work is needed at query time for these datasets.
      • All Datasets -- Dremio updates details for all datasets in a source. This mode increases query performance because less work is needed at query time.
      • As Needed -- Dremio updates details for a dataset at query time. This mode minimized metadata queries on a source when not used, but might lead to longer planning times.
    • Fetch every -- Specify fetch time based on minutes, hours, days, or weeks. Default: 1 hour
    • Expire after -- Specify expiration time based on minutes, hours, days, or weeks. Default: 3 hours

      Sharing

You can specify which users can edit. Options include:

  • All users can edit.
  • Specific users can edit.

    Oracle TLS Configuration

    To use TLS to connect to an Oracle source, do the following:
  1. Select the option Connect using SSL/TLS when initially configuring it.
  2. Optionally add the Server Certificate Distinguished Name.
    If you want to ensure the server you’re connecting to be the exact one matching a particular certificate string then add the Server Certificate Distinguished Name under the SSL/TLS option in Advanced Options.
  3. Add the Certificate Authority certificate to Dremio's trust store.
    To add the CA certificate that is used to sign the Oracle certificate into Dremio's trust store:
    1. Import the CA certificate and convert the certificate into DER format (required by Java keytool). For example, using OpenSSL tool:
      $ openssl x509 -outform der -in oracle-ca.pem -out oracle-ca.der
    2. Add the certificate to a new or existing truststore.
      $ keytool -import -alias oracle-ca -keystore dremio-truststore.jks -file oracle-ca.der
    3. Modify the DREMIO_JAVA_SERVER_EXTRA_OPTS section of the dremio-env configuration file to use the trust store by adding the following:
      Djavax.net.ssl.trustStore=<path/to>/dremio-truststore.jks Djavax.net.ssl.trustStoreType=JKS
      Djavax.net.ssl.trustStorePassword=<password>

For More Information


results matching ""

    No results matching ""