Skip to main content

Developing Client Applications with Apache Arrow Flight

You can create client applications that use Arrow Flight to query data lakes at data-transfer speeds greater than speeds possible with ODBC and JDBC, without incurring the cost in time and CPU resources of deserializing data. As the volumes of data that are transferred increase in size, the performance benefits from the use of Apache Flight rather than ODBC or JDBC also increase.

You can run queries on datasets that are in the default project of a Dremio Cloud organization. Dremio Cloud is able to determine the organization and the default project from the authentication token that a Flight client uses. To query datasets in a non-default project, you can pass in the ID for the non-default project.

Dremio Cloud provides these endpoints for Arrow Flight connections:

  • In the US control plane: data.dremio.cloud:443
  • In the EU control plane: data.eu.dremio.cloud:443

All traffic within a control plane between Flight clients and Dremio Cloud go through the endpoint for that control plane. However, Dremio Cloud can scale up or down automatically to accommodate increasing and decreasing traffic on the endpoint.

Unless you pass in a different project ID, Arrow Flight clients run queries only against datasets that are in the default project or on datasources that are associated with the default project. By default, Dremio Cloud uses the oldest project in an organization as that organization's default project.

Organization administrators can specify which project to use as the default project. See "Setting the Default Project" in Managing Projects.

Supported Versions of Apache Arrow

Dremio Cloud supports client applications that use Arrow Flight in Apache Arrow version 6.0.

Supported Authentication Method

Client applications can authenticate to Dremio Cloud with personal access tokens (PATs). To create a PAT, follow the steps in the section Creating a Token.

Flight sessions

A Flight session has a duration of 120 minutes during which a Flight client interacts with Dremio Cloud. A Flight client initiates a new session by passing a getFlightInfo() request that does not include a Cookie header that specifies a session ID that was obtained from Dremio Cloud. All requests that pass the same session ID are considered to be in the same session.

  1. The Flight client, having obtained a PAT from Dremio Cloud, sends a getFlightInfo() request that includes the query to run, the URI for the endpoint, and the bearer token (PAT). A single bearer token can be used for requests until it expires.

  2. If Dremio Cloud is able to authenticate the Flight client by using the bearer token, it sends a response that includes FlightInfo, a Set-Cookie header with the session ID, the bearer token, and a Set-Cookie header with the ID of the default project in the organization.

    FlightInfo responses from Dremio Cloud include the single endpoint for the control plane being used and the ticket for that endpoint. There is only one endpoint listed in FlightInfo responses.

    Session IDs are generated by Dremio Cloud.

  3. The client sends a getStream() request that includes the ticket, a Cookie header for the session ID, the bearer token, and a Cookie header for the ID of the default project.

  4. Dremio Cloud returns the query results in one flight.

  5. The Flight client sends another getFlightInfo() request using the same session ID and bearer token. If this second request did not include the session ID that Dremio Cloud sent in response to the first request, then Dremio Cloud would send a new session ID and a new session would begin.

Using a Non-Default Project

To run queries on datasets and data sources in non-default projects in Dremio Cloud, the project_id of the projects must be passed as a session option. The project_id is stored in the user session, and the server responds with a Set-Cookie header containing the session ID. The client must include this cookie in all subsequent requests.

To enable this behavior, a cookie middleware must be added to the Flight client. This middleware is responsible for managing cookies and will add the previous session ID to all subsequent requests.

After adding the middleware when initializing the client object, the project_id can be passed as a session option.

Here are examples of how to implement the project_id in Java and Go:

Pass in the ID for a non-default project in Java
// Create a ClientCookieMiddleware
final FlightClient.Builder flightClientBuilder = FlightClient.builder();
final ClientCookieMiddleware.Factory cookieFactory = new ClientCookieMiddleware.Factory();
flightClientBuilder.intercept(cookieFactory);

// Add the project ID to the session options
final SetSessionOptionsRequest setSessionOptionRequest =
new SetSessionOptionsRequest(ImmutableMap.<String, SessionOptionValue>
builder().put("project_id",
SessionOptionValueFactory.makeSessionOptionValue(yourprojectid)).build());

// Close your session later once query is done
client.closeSession(new CloseSessionRequest(), bearerToken, headerCallOption);
note

In Dremio Cloud, the term catalog is sometimes used interchangeably with project_id. Therefore, using catalog instead of project_id will also work when selecting a non-default project. However, since the catalog is more commonly associated with Dremio Arctic, we recommend using project_id for clarity. Throughout this documentation, we will consistently use project_id.

Managing Workloads

Dremio administrators can use the Arrow Flight server endpoint to manage query workloads by adding the following connection properties to Flight clients:

Flight Client PropertyDescription
ENGINEName of the engine to use to process all queries issued during the current session.
SCHEMAThe name of the schema (datasource or folder, including child paths, such as mySource.folder1 and folder1.folder2) to use by default when a schema is not specified in a query.

Sample Arrow Flight Client Applications

Dremio provides sample Arrow Flight client applications in several languages at Dremio Hub.

Both sample clients use the hostname local and the port number 32010 by default. Make sure you override these defaults with the hostname data.dremio.cloud or data.eu.dremio.cloud and the port number 443.

note

The Python sample application only supports connecting to the default Sonar project in Dremio Cloud.