SQL Server

Setup and Best Practices

Resource Allocation Considerations

For RDBMS sources like SQL Server, Dremio's query execution is largely single threaded. This means that for each SQL Server directed query, only one Dremio node will experience a computational load. So unlike most other data sources, larger Dremio clusters won't lead to faster individual query execution times. However, if you expect a large number of concurrent queries (as in the case of many simultaneous Dremio users) these will be distributed evenly across the nodes.

[info] Note

Ensure that your Dremio cluster has access to the appropriate port for your SQL Server source. By default this is port 1433.

Dremio and SQL Server

Query Pushdowns

For some operations Dremio can tell the data source to execute that part of the query natively, often dramatically improving performance. These operations are called 'pushdowns.'

Since they share a common language (SQL), Dremio supports most operations as pushdowns in SQL Server. These include:

  • Filter (SQL: WHERE)
  • Limit (SQL: LIMIT)
  • Sorting (SQL: ORDER BY)
  • Aggregation (SQL: GROUP BY)
  • Project (with expressions) (e.g. SQL: SELECT columnA + columnB, columnC, columnD)

NOTE: Since SQL Server has no boolean type, project operations that contain SQL expressions which evaluate to true or false (e.g. SELECT username, friends > 0), and filter operations that include boolean literals in a filter (e.g. WHERE currentAccount = true) cannot be executed as pushdowns.

Initial Connection

Depending on the number of tables in your SQL Server source, the final step of adding it to Dremio can take anywhere from a few seconds to a few minutes as the source's metadata is processed. However, this is a one-time cost and further queries to the source will not incur additional metadata reads.

Dremio Configuration

General

Connection

Name Description
Host SQL Server host name.
Port SQL Server port number. Defaults to 1433.
Database (Optional) Database name

Authentication

Authentication Type includes:

  • No authentication
  • Master Authentication.
    • Username -- SQL Server user name.
    • Password -- SQL Server password.

Advanced Options

  • Show only the initial database used for connecting -- If selected, hides the other DBs that the credential has access to.
  • Record fetch size -- Number of records to fetch at once. Set to 0 (zero) to have Dremio automatically decide. Default: 10
  • Enable legacy dialect

Reflection Refresh

  • Never refresh -- Specifies how often to refresh based on hours, days, weeks, or never.
  • Never expire -- Specifies how often to expire based on hours, days, weeks, or never.

    Metadata

Dataset Handling

  • Remove dataset definitions if underlying data is unavailable (Default).
    If this box is not checked and the underlying files under a folder are removed or the folder/source is not accessible, Dremio does not remove the dataset definitions. This option is useful in cases when files are temporarily deleted and put back in place with new sets of files.

Metadata Refresh

  • Dataset Discovery -- Refresh interval for top-level source object names such as names of DBs and tables.
    • Fetch every -- Specify fetch time based on minutes, hours, days, or weeks. Default: 1 hour
  • Dataset Details -- The metadata that Dremio needs for query planning such as information needed for fields, types, shards, statistics, and locality.
    • Fetch mode -- Specify either Only Queried Datasets, All Datasets, or As Needed. Default: Only Queried Datasets
      • Only Queried Datasets -- Dremio updates details for previously queried objects in a source.
        This mode increases query performance because less work is needed at query time for these datasets.
      • All Datasets -- Dremio updates details for all datasets in a source. This mode increases query performance because less work is needed at query time.
      • As Needed -- Dremio updates details for a dataset at query time. This mode minimized metadata queries on a source when not used, but might lead to longer planning times.
    • Fetch every -- Specify fetch time based on minutes, hours, days, or weeks. Default: 1 hour
    • Expire after -- Specify expiration time based on minutes, hours, days, or weeks. Default: 3 hours

      Sharing

You can specify which users can edit. Options include:

  • All users can edit.
  • Specific users can edit.

    For More Information


results matching ""

    No results matching ""