Python
You can develop Dremio client applications in Python that use Arrow Flight. Here are code snippets to help you get going.
Prerequisites
Before you use any of the code snippets, ensure that you take these steps:
-
Install and set up Python 3, which you can download from here.
-
Install dependencies either with
conda
or withpip
:-
To use
Install with condaconda
, run this command:conda install -c conda-forge pyarrow pandas
-
To use pip, run this command:
Install with pippip install pyarrow pandas
-
Helper Classes and Imports
Start with the following helper classes and imports. These pull in the correct modules and provide helper classes to construct the middleware for connecting to Dremio:
Helper classes and importsfrom http.cookies import SimpleCookie
from pyarrow import flight
class DremioClientAuthMiddlewareFactory(flight.ClientMiddlewareFactory):
"""A factory that creates DremioClientAuthMiddleware(s)."""
def __init__(self):
self.call_credential = []
def start_call(self, info):
return DremioClientAuthMiddleware(self)
def set_call_credential(self, call_credential):
self.call_credential = call_credential
class DremioClientAuthMiddleware(flight.ClientMiddleware):
"""
A ClientMiddleware that extracts the bearer token from
the authorization header returned by the Dremio
Flight Server Endpoint.
Parameters
----------
factory : ClientHeaderAuthMiddlewareFactory
The factory to set call credentials if an
authorization header with bearer token is
returned by the Dremio server.
"""
def __init__(self, factory):
self.factory = factory
def received_headers(self, headers):
auth_header_key = 'authorization'
authorization_header = []
for key in headers:
if key.lower() == auth_header_key:
authorization_header = headers.get(auth_header_key)
if not authorization_header:
raise Exception('Did not receive authorization header back from server.')
self.factory.set_call_credential([
b'authorization', authorization_header[0].encode('utf-8')])
class CookieMiddlewareFactory(flight.ClientMiddlewareFactory):
"""A factory that creates CookieMiddleware(s)."""
def __init__(self):
self.cookies = {}
def start_call(self, info):
return CookieMiddleware(self)
class CookieMiddleware(flight.ClientMiddleware):
"""
A ClientMiddleware that receives and retransmits cookies.
For simplicity, this does not auto-expire cookies.
Parameters
----------
factory : CookieMiddlewareFactory
The factory containing the currently cached cookies.
"""
def __init__(self, factory):
self.factory = factory
def received_headers(self, headers):
for key in headers:
if key.lower() == 'set-cookie':
cookie = SimpleCookie()
for item in headers.get(key):
cookie.load(item)
self.factory.cookies.update(cookie.items())
def sending_headers(self):
if self.factory.cookies:
cookie_string = '; '.join("{!s}={!s}".format(key, val.value) for (key, val) in self.factory.cookies.items())
return {b'cookie': cookie_string.encode('utf-8')}
return {}
Connecting to Dremio
Your client applications can authenticate to Dremio with a username and a password, or with a personal access token.
When you use either of these two code snippets, ensure that they are below this line, and that this line is below your main:
Add code snippets below this lineif __name__ == "__main__":
Connecting to Dremio with a Username and Password
Use this code to connect to Dremio with a username and password. Replace <hostname>
, <username>
, and <password>
:
# Construct middleware.
client_auth_middleware = DremioClientAuthMiddlewareFactory()
client_cookie_middleware = CookieMiddlewareFactory()
headers = []
# Establish initial connection.
client = flight.FlightClient("grpc+tcp://<hostname>:32010", middleware=[client_auth_middleware, client_cookie_middleware],**{});
# Retrieve bearer token and append to the header for future calls.
bearer_token = client.authenticate_basic_token('<username>', '<password>', flight.FlightCallOptions(headers=headers))
headers.append(bearer_token)
Connecting to Dremio with a Personal Access Token
You can specify a personal access token (PAT) instead of a password. The instructions for creating a PAT are here.
Replace <hostname>
, <username>
, and <PAT>
.
# Construct middleware.
client_auth_middleware = DremioClientAuthMiddlewareFactory()
client_cookie_middleware = CookieMiddlewareFactory()
headers = []
# Establish initial connection.
client = flight.FlightClient("grpc+tcp://<hostname>:32010", middleware=[client_auth_middleware, client_cookie_middleware],**{});
# Retrieve bearer token and append to the header for future calls.
bearer_token = client.authenticate_basic_token('<username>', '<PAT>', flight.FlightCallOptions(headers=headers))
headers.append(bearer_token)
Querying Data
This example queries a sample table that is in Dremio’s Sample Source data source. You can add this data source in Dremio on the Datasets page by clicking Add Source and then selecting Sample Source under Object Storage.
Sample Source queries # The query to execute.
query = 'SELECT * FROM Samples."samples.dremio.com"."NYC-taxi-trips" limit 10'
# Construct FlightDescriptor for the query result set.
flight_desc = flight.FlightDescriptor.for_command(query)
# Retrieve the schema of the result set.
options = flight.FlightCallOptions(headers=headers)
schema = client.get_schema(flight_desc, options)
# Get the FlightInfo message to retrieve the Ticket corresponding
# to the query result set.
flight_info = client.get_flight_info(flight.FlightDescriptor.for_command(query), options)
# Retrieve the result set as a stream of Arrow record batches.
reader = client.do_get(flight_info.endpoints[0].ticket, options)
# Print results.
print(reader.read_pandas())
Sample Application
For a sample application, see the python
directory of Dremio’s arrow-flight-client-example
repository on GitHub at https://github.com/dremio-hub/arrow-flight-client-examples/tree/main/python.