On this page

    Python

    You can develop Dremio client applications in Python that use Arrow Flight. Here are code snippets to help you get going.

    Prerequisites

    Before you use any of the code snippets, ensure that you take these steps:

    • Install and set up Python 3, which you can download from here.

    • Install dependencies either with conda or with pip:

      • To use conda, run this command:

        $ conda install -c conda-forge pyarrow pandas
        
      • To use pip, run this command:

        $ pip install pyarrow pandas
        

    Helper Classes and Imports

    Start with the following helper classes and imports. These pull in the correct modules and provide helper classes to construct the middleware for connecting to Dremio:

    from http.cookies import SimpleCookie
    from pyarrow import flight
    
    class DremioClientAuthMiddlewareFactory(flight.ClientMiddlewareFactory):
        """A factory that creates DremioClientAuthMiddleware(s)."""
    
        def __init__(self):
            self.call_credential = []
    
        def start_call(self, info):
            return DremioClientAuthMiddleware(self)
    
        def set_call_credential(self, call_credential):
            self.call_credential = call_credential
    
    class DremioClientAuthMiddleware(flight.ClientMiddleware):
        """
        A ClientMiddleware that extracts the bearer token from
        the authorization header returned by the Dremio
        Flight Server Endpoint.
    
        Parameters
        ----------
        factory : ClientHeaderAuthMiddlewareFactory
            The factory to set call credentials if an
            authorization header with bearer token is
            returned by the Dremio server.
        """
    
        def __init__(self, factory):
            self.factory = factory
    
        def received_headers(self, headers):
            auth_header_key = 'authorization'
            authorization_header = []
            for key in headers:
                if key.lower() == auth_header_key:
                    authorization_header = headers.get(auth_header_key)
            if not authorization_header:
                raise Exception('Did not receive authorization header back from server.')
            self.factory.set_call_credential([
                b'authorization', authorization_header[0].encode('utf-8')])
    
    class CookieMiddlewareFactory(flight.ClientMiddlewareFactory):
        """A factory that creates CookieMiddleware(s)."""
    
        def __init__(self):
            self.cookies = {}
    
        def start_call(self, info):
            return CookieMiddleware(self)
    
    
    class CookieMiddleware(flight.ClientMiddleware):
        """
        A ClientMiddleware that receives and retransmits cookies.
        For simplicity, this does not auto-expire cookies.
    
        Parameters
        ----------
        factory : CookieMiddlewareFactory
            The factory containing the currently cached cookies.
        """
    
        def __init__(self, factory):
            self.factory = factory
    
        def received_headers(self, headers):
            for key in headers:
                if key.lower() == 'set-cookie':
                    cookie = SimpleCookie()
                    for item in headers.get(key):
                        cookie.load(item)
    
                    self.factory.cookies.update(cookie.items())
    
        def sending_headers(self):
            if self.factory.cookies:
                cookie_string = '; '.join("{!s}={!s}".format(key, val.value) for (key, val) in self.factory.cookies.items())
                return {b'cookie': cookie_string.encode('utf-8')}
            return {}
    
    

    Connecting to Dremio

    Your client applications can authenticate to Dremio with a username and a password, or with a personal access token.

    When you use either of these two code snippets, ensure that they are below this line, and that this line is below your main:

    if __name__ == "__main__": 
    

    Connecting to Dremio with a Username and Password

    Use this code to connect to Dremio with a username and password. Replace <hostname>, <username>, and <password>:

        # Construct middleware.
        client_auth_middleware = DremioClientAuthMiddlewareFactory()
        client_cookie_middleware = CookieMiddlewareFactory()
    
        headers = []
        
        # Establish initial connection.
        client = flight.FlightClient("grpc+tcp://<hostname>:32010", middleware=[client_auth_middleware, client_cookie_middleware],**{});
    
        # Retrieve bearer token and append to the header for future calls.
        bearer_token = client.authenticate_basic_token('<username>', '<password>', flight.FlightCallOptions(headers=headers))
        headers.append(bearer_token)
    

    Connecting to Dremio with a Personal Access Token

    You can specify a personal access token (PAT) instead of a password. The instructions for creating a PAT are here.

    Replace <hostname>, <username>, and <PAT>.

        # Construct middleware.
        client_auth_middleware = DremioClientAuthMiddlewareFactory()
        client_cookie_middleware = CookieMiddlewareFactory()
    
        headers = []
        
        # Establish initial connection.
        client = flight.FlightClient("grpc+tcp://<hostname>:32010", middleware=[client_auth_middleware, client_cookie_middleware],**{});
    
        # Retrieve bearer token and append to the header for future calls.
        bearer_token = client.authenticate_basic_token('<username>', '<PAT>', flight.FlightCallOptions(headers=headers))
        headers.append(bearer_token)
    

    Querying Data

    This example queries a sample table that is in Dremio’s Sample Source data source. You can add this data source in Dremio on the Datasets page by clicking Add Source and then selecting Sample Source under Object Storage.

        # The query to execute.
        query = 'SELECT * FROM Samples."samples.dremio.com"."NYC-taxi-trips" limit 10'
    
        # Construct FlightDescriptor for the query result set.
        flight_desc = flight.FlightDescriptor.for_command(query)
    
        # Retrieve the schema of the result set.
        options = flight.FlightCallOptions(headers=headers)
        schema = client.get_schema(flight_desc, options)
    
        # Get the FlightInfo message to retrieve the Ticket corresponding
        # to the query result set.
        flight_info = client.get_flight_info(flight.FlightDescriptor.for_command(query), options)
    
        # Retrieve the result set as a stream of Arrow record batches.
        reader = client.do_get(flight_info.endpoints[0].ticket, options)
    
        # Print results.
        print(reader.read_pandas())
    

    Sample Application

    For a sample application, see the python directory of Dremio’s arrow-flight-client-example repository on GitHub at https://github.com/dremio-hub/arrow-flight-client-examples/tree/main/python.