This API retrieves information about a specific catalog entity (source, space, folder, file or dataset) using its path. Child information (if applicable) of the catalog entity are also retrieved along with their ID, path, type, and containerType.
GET /api/v3/catalog/by-path/{path}
Path is the Dremio path for the entity, using /
as a separator. Each path component should be url escaped.
For example, given a source called MySource
which has a folder called MyFolder
that contains a dataset called
MyDataset
, the URL will look like this:
GET /api/v3/catalog/by-path/MySource/MyFolder/MyDataset
If the dataset was called My?Dataset
, then the URL will be:
GET /api/v3/catalog/by-path/MySource/MyFolder/My%3FDataset
This is because ?
is a special character in URLs and we have to url escape it.
The CatalogEntity is one of the following:
403
- User does not have permission to view the catalog entity.
404
- A catalog entity with the specified path could not be found.
In this example, information is requested about a physical dataset yellow_tripdata_2009-01.csv, found in the HDFS source called DEV HDFS under the directory path data/nyctaxi.
GET localhost:9047/api/v3/catalog/by-path/DEV%20HDFS/data/nyctaxi/yellow_tripdata_2009-01.csv
curl -X GET \
http://localhost:9047/api/v3/catalog/by-path/DEV%20HDFS/data/nyctaxi/yellow_tripdata_2009-01.csv \
-H "Content-Type: application/json" \
-H "Authorization: _dremiohs85l11k2mh0b10l51ett9fsca"
For a physical dataset like this, the response body includes information about formatting and datatypes.
{
"entityType": "dataset",
"id": "8a2df787-2e28-49ef-b961-52e214672d33",
"type": "PHYSICAL_DATASET",
"path": [
"DEV HDFS",
"data",
"nyctaxi",
"yellow_tripdata_2009-01.csv"
],
"createdAt": "2019-01-10T16:10:29.676Z",
"tag": "0",
"format": {
"type": "Text",
"ctime": 0,
"isFolder": false,
"location": "/data/nyctaxi/yellow_tripdata_2009-01.csv",
"fieldDelimiter": ",",
"skipFirstLine": false,
"extractHeader": true,
"quote": "\"",
"comment": "#",
"escape": "\"",
"lineDelimiter": "\r\n",
"autoGenerateColumnNames": true,
"trimHeader": true
},
"accessControlList": {
"version": 0
},
"fields": [
{
"name": "Trip_Pickup_DateTime",
"type": {
"name": "VARCHAR"
}
},
{
"name": "Trip_Dropoff_DateTime",
"type": {
"name": "VARCHAR"
}
},
{
"name": "Passenger_Count",
"type": {
"name": "VARCHAR"
}
},
{
"name": "Trip_Distance",
"type": {
"name": "VARCHAR"
}
},
{
"name": "Total_Amt",
"type": {
"name": "VARCHAR"
}
}
],
"approximateStatisticsAllowed": false
}
In this example, a HDFS source, my_hdfs_2, has a sub-folder (data/loans) with three (3) folders (acquisition, acquisition-mini, and performance). Two of the folders are not promoted and one folder is promoted to a PDS. We are retrieving information about the loans entity.
Postman is used to generate samples.
GET localhost:9047/api/v3/catalog/by-path/my_hdfs_2/data/loans
curl -X GET \
http://localhost:9047/api/v3/catalog/by-path/my_hdfs_2/data/loans \
-H 'Authorization: _dremioo8opojj6vn4ughkvcpalpr46d6' \
-H 'Content-Type: application/json' \
-H 'Postman-Token: 6a8f1b11-6340-44c0-ae74-899b6c4df7bd' \
-H 'cache-control: no-cache'
import requests
url = "http://localhost:9047/api/v3/catalog/by-path/my_hdfs_2/data/loans"
payload = ""
headers = {
'Authorization': "_dremioo8opojj6vn4ughkvcpalpr46d6",
'Content-Type': "application/json",
'cache-control': "no-cache",
'Postman-Token': "f7f851aa-fbdd-403d-a400-1b816d93dfae"
}
response = requests.request("GET", url, data=payload, headers=headers)
print(response.text)
{
"entityType": "folder",
"id": "2ea08d02-13d3-419b-86cc-b39e7a8ee26b",
"path": [
"my_hdfs_2",
"data",
"loans"
],
"tag": "0",
"children": [
{
"id": "dremio:/my_hdfs_2/data/loans/\"acquisition\"",
"path": [
"my_hdfs_2",
"data",
"loans",
"\"acquisition\""
],
"type": "CONTAINER",
"containerType": "FOLDER"
},
{
"id": "cf771ed4-8ffc-49c6-b75c-b6ce4a518289",
"path": [
"my_hdfs_2",
"data",
"loans",
"\"acquisition-mini\""
],
"type": "DATASET",
"datasetType": "PROMOTED"
},
{
"id": "dremio:/my_hdfs_2/data/loans/\"performance\"",
"path": [
"my_hdfs_2",
"data",
"loans",
"\"performance\""
],
"type": "CONTAINER",
"containerType": "FOLDER"
}
],
"accessControlList": {
"version": "0"
}
}