Developing Client Applications with Apache Arrow Flight
You can create client applications that use Arrow Flight to query data lakes at data-transfer speeds greater than speeds possible with ODBC and JDBC, without incurring the cost in time and CPU resources of deserializing data. As the volumes of data that are transferred increase in size, the performance benefits from the use of Apache Flight rather than ODBC or JDBC also increase.
You can run queries on datasets that are in the default project of a Dremio Cloud organization. Dremio Cloud is able to determine the organization and the default project from the authentication token that a Flight client uses.
Dremio Cloud provides these endpoints for Arrow Flight connections:
- In the US control plane:
- In the EU control plane:
All traffic within a control plane between Flight clients and Dremio Cloud go through the endpoint for that control plane. However, Dremio Cloud can scale up or down automatically to accommodate increasing and decreasing traffic on the endpoint.
Arrow Flight clients can run queries only against datasets that are in the default project or on datasources that are associated with the default project. By default, Dremio Cloud uses the oldest project in an organization as that organization’s default project.
Organization administrators can specify which project to use as the default project. See “Setting the Default Project” in Managing Projects.
Supported Versions of Apache Arrow
Dremio Cloud supports client applications that use Arrow Flight in Apache Arrow version 6.0.
Supported Authentication Method
Client applications can authenticate to Dremio Cloud with personal access tokens (PATs). To create a PAT, follow the steps in the section Creating a Token.
A Flight session has a duration of 120 minutes during which a Flight client interacts with Dremio Cloud. A Flight client initiates a new session by passing a
getFlightInfo() request that does not include a Cookie header that specifies a session ID that was obtained from Dremio Cloud. All requests that pass the same session ID are considered to be in the same session.
The Flight client, having obtained a PAT from Dremio Cloud, sends a
getFlightInfo()request that includes the query to run, the URI for the endpoint, and the bearer token (PAT). A single bearer token can be used for requests until it expires.
If Dremio Cloud is able to authenticate the Flight client by using the bearer token, it sends a response that includes FlightInfo, a Set-Cookie header with the session ID, the bearer token, and a Set-Cookie header with the ID of the default project in the organization.
FlightInfo responses from Dremio Cloud include the single endpoint for the control plane being used and the ticket for that endpoint. There is only one endpoint listed in FlightInfo responses.
Session IDs are generated by Dremio Cloud.
The client sends a
getStream()request that includes the ticket, a Cookie header for the session ID, the bearer token, and a Cookie header for the ID of the default project.
Dremio Cloud returns the query results in one flight.
The Flight client sends another
getFlightInfo()request using the same session ID and bearer token. If this second request did not include the session ID that Dremio Cloud sent in response to the first request, then Dremio Cloud would send a new session ID and a new session would begin.
Sample Arrow Flight Client Applications
Dremio provides sample Arrow Flight client applications in Java and Python at Dremio Hub. The Go client in this repository does not support connections to Dremio Cloud.
Both sample clients use the hostname
local and the port number
32010 by default; so, be sure to override these defaults with the hostname
data.eu.dremio.cloud and the port number