Developing Arrow Flight SQL Client Applications for Dremio Cloud
You can use Apache Arrow Flight SQL to develop client applications that interact with Dremio Cloud. Apache Arrow Flight SQL is a new API developed by the Apache Arrow community for interacting with SQL databases. For more information about Apache Arrow Flight SQL, see the documentation for the Apache Arrow project.
Through Flight SQL, client applications can run queries, create prepared statements, and fetch metadata about the SQL dialect supported by datasource in Dremio, available types, defined tables, and more.
The requests for running queries are
- CommandExecute
- CommandStatementUpdate
The commands on prepared statements are:
- ActionClosePreparedStatementRequest: Closes a prepared statement.
- ActionCreatePreparedStatementRequest: Creates a prepared statement.
- CommandPreparedStatementQuery: Runs a prepared statement.
- CommandPreparedStatementUpdate: Runs a prepared statement that updates data.
The metadata requests that Dremio supports are:
- CommandGetDbSchemas: Lists the schemas that are in a catalog.
- CommandGetTables: Lists that tables that are in a catalog or schema.
- CommandGetTableTypes: Lists the table types that are supported in a catalog or schema. The types are Table, View, and System Table.
- CommandGetSqlInfo: Retrieves information about the datasource and the SQL dialect that it supports.
There are two clients already implemented and available in the Apache Arrow repository on GitHub for you to make use of:
At this time, you can only connect to the default Sonar project in Dremio Cloud.
Using the Sample Client
You can download and try out the sample client from https://github.com/dremio-hub/arrow-flight-sql-clients. Extract the content of the file and then, in a terminal window, change to the flight-sql-client-example
directory.
Before running the sample client, ensure that you have met these prerequisites:
- Add the Samples data lake to your Dremio Cloud project by clicking the icon in the Data Lakes section of the Datasets page.
- Ensure that Java 8 or later (up to Java 15) is installed on the system on which you run the example commands.
Command Syntax for the Sample Client
Use this syntax when sending commands to the sample client:
Sample client usageUsage: java -jar flight-sql-sample-client-application.jar -host localhost -port 443 ...
-command,--command <arg> Method to run
-dsv,--disableServerVerification <arg> Disable TLS server verification.
Defaults to false.
-host,--hostname <arg> `data.dremio.cloud` for Dremio's US control plane
`data.eu.dremio.cloud` for Dremio's European control plane
-kstpass,--keyStorePassword <arg> The jks keystore password.
-kstpath,--keyStorePath <arg> Path to the jks keystore.
-pat,--personalAccessToken <arg> Personal access token
-port,--flightport <arg> 443
-query,--query <arg> The query to run
-schema,--schema <arg> The schema to use
-sp,--sessionProperty <arg> Key value pairs of
SessionProperty, example: -sp
schema='Samples."samples.dremio.
com"' -sp key=value
-table,--table <arg> The table to query
-tls,--tls <arg> Enable encrypted connection.
Defaults to true.
Examples
The examples demonstrate what is returned for each of these requests:
- CommandGetDbSchemas
- CommandGetTables
- CommandGetTableTypes
- CommandExecute
These examples use the Flight endpoint for Dremio's US control plane: data.dremio.cloud
. To use Dremio's European control plane, use this endpoint instead: data.eu.dremio.cloud
.
Flight SQL Request: CommandGetDbSchemas
This command submits a CommandGetDbSchemas
request to list the schemas in a catalog.
java -jar flight-sql-sample-client-application.jar -tls true -host data.dremio.cloud -port 443 --pat '<personal-access-token>' -command GetSchemas
catalog_name db_schema_name
null $scratch
null @myUserName
null INFORMATION_SCHEMA
null Samples
null sys
Flight SQL Request: CommandGetTables
This command submits a CommandGetTables
request to list the tables that are in a catalog or schema.
java -jar flight-sql-sample-client-application.jar -tls true -host data.dremio.cloud -port 443 --pat '<personal-access-token>' -command GetTables -schema INFORMATION_SCHEMA
If you have a folder in your schema, you can escape it like this:
Example CommandGetTables request with folder in schemajava -jar flight-sql-sample-client-application.jar -tls true -host data.dremio.cloud -port 443 --pat '<personal-access-token>' -command GetTables -schema "Samples\ (1).samples.dremio.com"
catalog_name db_schema_name table_name table_type
null INFORMATION_SCHEMA CATALOGS SYSTEM_TABLE
null INFORMATION_SCHEMA COLUMNS SYSTEM_TABLE
null INFORMATION_SCHEMA SCHEMATA SYSTEM_TABLE
null INFORMATION_SCHEMA TABLES SYSTEM_TABLE
null INFORMATION_SCHEMA VIEWS SYSTEM_TABLE
Flight SQL Request: CommandGetTableTypes
This command submits a CommandTableTypes
request to list the table types supported.
java -jar flight-sql-sample-client-application.jar -tls true -host data.dremio.cloud -port 443 --pat '<personal-access-token>' -command GetTableTypes
table_type
TABLE
SYSTEM_TABLE
VIEW
Flight SQL Request: CommandExecute
This command submits a CommandExecute
request to run a single SQL statement.
java -jar flight-sql-sample-client-application.jar -tls true -host data.dremio.cloud -port 443 --pat '<personal-access-token>' -command Execute -query 'SELECT * FROM Samples."samples.<Dremio-user-name>.com"."NYC-taxi-trips" limit 10'
pickup_datetime passenger_count trip_distance_mi fare_amount tip_amount total_amount
2013-05-27T19:15 1 1.26 7.5 0.0 8.0
2013-05-31T16:40 1 0.73 5.0 1.2 7.7
2013-05-27T19:03 2 9.23 27.5 5.0 38.33
2013-05-31T16:24 1 2.27 12.0 0.0 13.5
2013-05-27T19:17 1 0.71 5.0 0.0 5.5
2013-05-27T19:11 1 2.52 10.5 3.15 14.15
2013-05-31T16:41 5 1.01 6.0 1.1 8.6
2013-05-31T16:37 1 1.25 8.5 0.0 10.0
2013-05-31T16:39 1 2.04 10.0 1.5 13.0
2013-05-27T19:02 1 11.73 32.5 8.12 41.12
Code Samples
Creating a FlightSqlClient
Refer to this code sample to create a FlightClient
. Then, wrap your FlightClient
in a FlightSqlClient
:
// Wraps a FlightClient in a FlightSqlClient
FlightSqlClient flightSqlClient = new FlightSqlClient(flightClient);
// Be sure to close the FlightSqlClient after using it
flightSqlClient.close();
Retrieving a List of Database Schemas
This code issues a CommandGetSchemas metadata request:
CommandGetSchemas metadata requestString catalog = null; // The catalog. (may be null)
String dbSchemaFilterPattern = null; // The schema filter pattern. (may be null)
FlightInfo flightInfo = flightSqlClient.getSchemas(catalog, dbSchemaFilterPattern);
Retrieving a List of Tables
This code issues a CommandGetTables metadata request:
CommandGetTables metadata requestString catalog = null; // The catalog. (may be null)
String dbSchemaFilterPattern = "Samples\\ (1).samples.dremio.com"; // The schema filter pattern. (may be null)
String tableFilterPattern = null; // The table filter pattern. (may be null)
List<String> tableTypes = null; // The table types to include. (may be null)
boolean includeSchema = false; // True to include the schema upon return, false to not include the schema.
FlightInfo flightInfo = flightSqlClient.getTables(catalog, dbSchemaFilterPattern, tableFilterPattern, tableTypes, includeSchema);
Retrieving a List of Table Types That a Database Supports
This code issues a CommandGetTableTypes metadata request:
CommandGetTableTypes metadata requestFlightInfo flightInfo = flightSqlClient.getTableTypes();
Running a Query
This code issues a CommandExecute request:
CommandExecute requestFlightInfo flightInfo = flightSqlClient.execute("SELECT * FROM Samples.\"samples.myUserName.com\".\"NYC-taxi-trips\" limit 10");
Consuming Data Returned for a Query
Consume data returned for queryFlightInfo flightInfo; // Use a FlightSqlClient method to get a FlightInfo
// 1. Fetch each partition sequentially (though this can be done in parallel)
for (FlightEndpoint endpoint : flightInfo.getEndpoints()) {
// 2. Get a stream of results as Arrow vectors
try (FlightStream stream = flightSqlClient.getStream(endpoint.getTicket())) {
// 3. Iterate through the stream until the end
while (stream.next()) {
// 4. Get a chunk of results (VectorSchemaRoot) and print it to the console
VectorSchemaRoot vectorSchemaRoot = stream.getRoot();
System.out.println(vectorSchemaRoot.contentToTSVString());
}
}
}
Client Interactions with Dremio Cloud
This diagram shows an example of how an Arrow Flight SQL client initiates a Flight session and runs a query. It also shows what messages pass between the proxy at the Arrow Flight SQL endpoint, the control plane, and the execution plane.
The Flight client, having obtained a PAT from Dremio Cloud, calls the
execute()
method, which then sends agetFlightInfo()
request. This request includes the query to run, the URI for the endpoint, and the bearer token (PAT). A single bearer token can be used for requests until it expires.A
getFlightInfo()
request initiates a new Flight session, which has a duration of 120 minutes. A Flight session is identified by its ID. Session IDs are generated by the proxy at the Arrow Flight SQL endpoint. All requests that pass the same session ID are considered to be in the same Flight session.The bearer token includes the user ID and the organization ID. From those two pieces of information, the proxy at the endpoint determines the project ID, and then passes the organization ID, project ID, and user ID in the
getFlightInfo()
request that it forwards to the control plane.If the control plane is able to authenticate the Flight client by using the bearer token, it sends a response that includes FlightInfo to the proxy.
FlightInfo responses include the single endpoint for the control plane being used and the ticket for that endpoint. There is only one endpoint listed in FlightInfo responses.
The proxy at the endpoint adds the session ID and the project ID, and passes the response to the client.
The client sends a
getStream()
request that includes the ticket, a Cookie header for the session ID, the bearer token, and a Cookie header for the ID of the default project.The proxy adds the organization ID and passes the
getStream()
request to the control plane.The control plane devises the query plan and sends that to the execution plane.
The execution plane runs the query and sends the results to the control plane in one flight.
The control plane passes the results to the proxy.
The proxy passes the results to the client.