Python
You can develop Dremio Cloud client applications in Python that use that use Arrow Flight. Here are code snippets to help you get going.
Prerequisites
Before you use any of the code snippets, ensure that you take these steps:
-
Install and set up Python 3, which you can download from here.
-
Install dependencies either with
Install dependencies with condaconda
or withpip
:Install dependencies with pipconda install -c conda-forge pyarrow pandas
pip install pyarrow pandas
Helper Classes and Imports
Start with the following helper classes and imports. These pull in the correct modules and provide helper classes to construct the middleware for connecting to Dremio Cloud:
Helper classes and importsfrom http.cookies import SimpleCookie
from pyarrow import flight
class DremioClientAuthMiddlewareFactory(flight.ClientMiddlewareFactory):
"""A factory that creates DremioClientAuthMiddleware(s)."""
def __init__(self):
self.call_credential = []
def start_call(self, info):
return DremioClientAuthMiddleware(self)
def set_call_credential(self, call_credential):
self.call_credential = call_credential
class DremioClientAuthMiddleware(flight.ClientMiddleware):
"""
A ClientMiddleware that extracts the bearer token from
the authorization header returned by the Dremio Cloud
Flight Server Endpoint.
Parameters
----------
factory : ClientHeaderAuthMiddlewareFactory
The factory to set call credentials if an
authorization header with bearer token is
returned by Dremio Cloud.
"""
def __init__(self, factory):
self.factory = factory
def received_headers(self, headers):
auth_header_key = 'authorization'
authorization_header = []
for key in headers:
if key.lower() == auth_header_key:
authorization_header = headers.get(auth_header_key)
if not authorization_header:
raise Exception('Did not receive authorization header back from server.')
self.factory.set_call_credential([
b'authorization', authorization_header[0].encode('utf-8')])
class CookieMiddlewareFactory(flight.ClientMiddlewareFactory):
"""A factory that creates CookieMiddleware(s)."""
def __init__(self):
self.cookies = {}
def start_call(self, info):
return CookieMiddleware(self)
class CookieMiddleware(flight.ClientMiddleware):
"""
A ClientMiddleware that receives and retransmits cookies.
For simplicity, this does not auto-expire cookies.
Parameters
----------
factory : CookieMiddlewareFactory
The factory containing the currently cached cookies.
"""
def __init__(self, factory):
self.factory = factory
def received_headers(self, headers):
for key in headers:
if key.lower() == 'set-cookie':
cookie = SimpleCookie()
for item in headers.get(key):
cookie.load(item)
self.factory.cookies.update(cookie.items())
def sending_headers(self):
if self.factory.cookies:
cookie_string = '; '.join("{!s}={!s}".format(key, val.value) for (key, val) in self.factory.cookies.items())
return {b'cookie': cookie_string.encode('utf-8')}
return {}
Connecting to Dremio Cloud
Your client applications can authenticate to Dremio Cloud with a personal access token.
Ensure that this code snippet for connecting is below the line of code if __name__ == "__main__":
, and that this line is below your main
.
Instructions for creating a PAT are here.
Replace <PAT>
in the code snippet.
# Dremio Cloud connection via PAT
# TLS Encryption is enabled. Certificate verification is disabled.
headers = []
connection_args = {}
# Construct middleware.
client_cookie_middleware = CookieMiddlewareFactory()
# Disable server verification
connection_args['disable_server_verification'] = True
# Establish initial connection
client = flight.FlightClient("grpc+tls://data.dremio.cloud:443", middleware=[client_cookie_middleware], **connection_args)
# Retrieve bearer token and append to the header for future calls.
headers.append((b'authorization', "Bearer {}".format('<PAT>').encode('utf-8')))
Querying Data
This example queries a sample table that is in Dremio Cloud’s Sample Source data source. You can add this data source in Dremio Cloud on the Datasets page by clicking Add Source and then selecting Sample Source under Object Storage.
Example query # The query to execute.
query = 'SELECT * FROM Samples."samples.dremio.com"."NYC-taxi-trips" limit 10'
# Construct FlightDescriptor for the query result set.
flight_desc = flight.FlightDescriptor.for_command(query)
# Retrieve the schema of the result set.
options = flight.FlightCallOptions(headers=headers)
schema = client.get_schema(flight_desc, options)
# Get the FlightInfo message to retrieve the Ticket corresponding
# to the query result set.
flight_info = client.get_flight_info(flight.FlightDescriptor.for_command(query), options)
# Retrieve the result set as a stream of Arrow record batches.
reader = client.do_get(flight_info.endpoints[0].ticket, options)
# Print results.
print(reader.read_pandas())
Sample Application
For a sample application, see the python
directory of Dremio’s arrow-flight-client-example
repository on GitHub at https://github.com/dremio-hub/arrow-flight-client-examples/tree/main/python.