On this page

    View a Catalog using its Path

    This API retrieves information about a specific catalog entity (source, space, folder, file or dataset) using its path. Child information (if applicable) of the catalog entity are also retrieved along with their ID, path, type, and containerType.

    Syntax

    GET /api/v3/catalog/by-path/{path}
    

    Path is the Dremio path for the entity, using / as a separator. Each path component should be url escaped.

    Example Syntax

    For example, given a source called MySource which has a folder called MyFolder that contains a dataset called MyDataset, the URL will look like this:

    GET /api/v3/catalog/by-path/MySource/MyFolder/MyDataset

    If the dataset was called My?Dataset, then the URL will be:

    GET /api/v3/catalog/by-path/MySource/MyFolder/My%3FDataset

    This is because ? is a special character in URLs and we have to url escape it.

    Response Output

    The CatalogEntity is one of the following:

    Response Codes

    403 - User does not have permission to view the catalog entity.
    404 - A catalog entity with the specified path could not be found.

    Example: Get PDS by Path

    In this example, information is requested about a physical dataset yellow_tripdata_2009-01.csv, found in the HDFS source called DEV HDFS under the directory path data/nyctaxi.

    HTTP

    GET localhost:9047/api/v3/catalog/by-path/DEV%20HDFS/data/nyctaxi/yellow_tripdata_2009-01.csv

    Curl

    curl -X GET \
    	http://localhost:9047/api/v3/catalog/by-path/DEV%20HDFS/data/nyctaxi/yellow_tripdata_2009-01.csv \
    	-H "Content-Type: application/json" \
    	-H "Authorization: _dremiohs85l11k2mh0b10l51ett9fsca" 
    

    Response

    For a physical dataset like this, the response body includes information about formatting and datatypes.

    {
      "entityType": "dataset",
      "id": "8a2df787-2e28-49ef-b961-52e214672d33",
      "type": "PHYSICAL_DATASET",
      "path": [
        "DEV HDFS",
        "data",
        "nyctaxi",
        "yellow_tripdata_2009-01.csv"
      ],
      "createdAt": "2019-01-10T16:10:29.676Z",
      "tag": "0",
      "format": {
        "type": "Text",
        "ctime": 0,
        "isFolder": false,
        "location": "/data/nyctaxi/yellow_tripdata_2009-01.csv",
        "fieldDelimiter": ",",
        "skipFirstLine": false,
        "extractHeader": true,
        "quote": "\"",
        "comment": "#",
        "escape": "\"",
        "lineDelimiter": "\r\n",
        "autoGenerateColumnNames": true,
        "trimHeader": true
      },
      "accessControlList": {
        "version": 0
      },
      "fields": [
        {
          "name": "Trip_Pickup_DateTime",
          "type": {
            "name": "VARCHAR"
          }
        },
        {
          "name": "Trip_Dropoff_DateTime",
          "type": {
            "name": "VARCHAR"
          }
        },
        {
          "name": "Passenger_Count",
          "type": {
            "name": "VARCHAR"
          }
        },
        {
          "name": "Trip_Distance",
          "type": {
            "name": "VARCHAR"
          }
        },
        {
          "name": "Total_Amt",
          "type": {
            "name": "VARCHAR"
          }
        }
      ],
      "approximateStatisticsAllowed": false
    }
    

    Example: Get Source Folder by Path

    In this example, a HDFS source, my_hdfs_2, has a sub-folder (data/loans) with three (3) folders (acquisition, acquisition-mini, and performance). Two of the folders are not promoted and one folder is promoted to a PDS. We are retrieving information about the loans entity.

    Note:
    Postman is used to generate samples.

    HTTP

    GET localhost:9047/api/v3/catalog/by-path/my_hdfs_2/data/loans
    

    Curl

    curl -X GET \
      http://localhost:9047/api/v3/catalog/by-path/my_hdfs_2/data/loans \
      -H 'Authorization: _dremioo8opojj6vn4ughkvcpalpr46d6' \
      -H 'Content-Type: application/json'
    

    Python

    import requests
    
    url = "http://localhost:9047/api/v3/catalog/by-path/my_hdfs_2/data/loans"
    
    payload = ""
    headers = {
        'Authorization': "_dremioo8opojj6vn4ughkvcpalpr46d6",
        'Content-Type': "application/json"
        }
    
    response = requests.request("GET", url, data=payload, headers=headers)
    
    print(response.text)
    

    Response

    {
        "entityType": "folder",
        "id": "2ea08d02-13d3-419b-86cc-b39e7a8ee26b",
        "path": [
            "my_hdfs_2",
            "data",
            "loans"
        ],
        "tag": "0",
        "children": [
            {
                "id": "dremio:/my_hdfs_2/data/loans/\"acquisition\"",
                "path": [
                    "my_hdfs_2",
                    "data",
                    "loans",
                    "\"acquisition\""
                ],
                "type": "CONTAINER",
                "containerType": "FOLDER"
            },
            {
                "id": "cf771ed4-8ffc-49c6-b75c-b6ce4a518289",
                "path": [
                    "my_hdfs_2",
                    "data",
                    "loans",
                    "\"acquisition-mini\""
                ],
                "type": "DATASET",
                "datasetType": "PROMOTED"
            },
            {
                "id": "dremio:/my_hdfs_2/data/loans/\"performance\"",
                "path": [
                    "my_hdfs_2",
                    "data",
                    "loans",
                    "\"performance\""
                ],
                "type": "CONTAINER",
                "containerType": "FOLDER"
            }
        ],
        "accessControlList": {
            "version": "0"
        }
    }