Promote a File/Folder to a Physical Dataset
This API promotes a file or folder in a file-based source to a physical dataset (PDS). The supplied path is used to determine what entity is promoted.
Files or folders inside a source can be promoted to physical datasets. This converts the folder/file to a dataset; the dataset then has a new ID since it is a new entity.
Note:
To unpromote a physical dataset (PDS), you delete the dataset. This reverts the PDS back to its original form (a folder or file).
To promote a file or folder, you need the following:
- ID (encoded) of the folder or file. Use GET /catalog/by-path to obtain information about the file or folder.
- Request input. At minimum:
entityType
,path
,type
, andformat
. - Dataset format.
Syntax
Method and URLPOST /api/v3/catalog/{id}
Request Input
See Dataset for more information.
Request input{
"entityType": "dataset",
"path": [String],
"type": String ["PHYSICAL_DATASET"],
"format": DatasetFormat [required for promoting datasets],
}
Response Output
See Dataset for more information.
Response output{
"entityType": "dataset" [immutable, generated by Dremio],
"id": String [immutable, generated by Dremio],
"type": String ["PHYSICAL_DATASET", "VIRTUAL_DATASET"] [immutable],
"path": [String] [immutable after creation],
"createdAt": String (RFC3339 date) [immutable, generated by Dremio],
"tag": String [immutable, generated by Dremio],
"format": DatasetFormat [optional, required for promoted datasets],
"accessControlList": DatasetAccessControlList [optional],
"owner": DatasetOwner [optional],
"accelerationRefreshPolicy": DatasetAccelerationRefreshPolicy [optional, only for physical datasets in a source],
"fields": [DatasetField] [immutable],
"approximateStatisticsAllowed": Boolean [optional, introduced in Dremio 2.1.0]
}
Response Codes
400
- The supplied CatalogEntity object is invalid.
403
- User does not have permission to create the catalog entity.
Example: Promote a Folder
In this example, a folder, acquisition-mini, in a HDFS source, my_hdfs_2, is being promoted to a PDS. We have the following information about the folder:
-
Encoded ID:
Encoded IDdremio%3A%2Fmy_hdfs_2%2Fdata%2Floans%2Facquisition-mini
-
Path:
Folder path[ "my_hdfs_2", "data", "loans", "acquisition-mini" ]
-
Dataset format:
Dataset format{ "type": "Text", "fieldDelimiter": "|", "lineDelimiter": "\n", "escape": "\"", "skipFirstLine": false, "extractHeader": false, "trimHeader": false, "autoGenerateColumnNames": true }
Note:
Postman was used to generate samples.
HTTP
HTTP request examplePOST localhost:9047/api/v3/catalog/dremio%3A%2Fmy_hdfs_2%2Fdata%2Floans%2Facquisition-mini
Raw Body Input
HTTP request raw body input{
"entityType": "dataset",
"path": [
"my_hdfs_2",
"data",
"loans",
"acquisition-mini"
],
"type": "PHYSICAL_DATASET",
"format": {
"type": "Text",
"fieldDelimiter": "|",
"lineDelimiter": "\n",
"escape": "\"",
"skipFirstLine": false,
"extractHeader": false,
"trimHeader": false,
"autoGenerateColumnNames": true
}
}
Curl
curl request examplecurl -X POST \
http://localhost:9047/api/v3/catalog/dremio%3A%2Fmy_hdfs_2%2Fdata%2Floans%2Facquisition-mini \
-H 'Authorization: _dremioo8opojj6vn4ughkvcpalpr46d6' \
-H 'Content-Type: application/json' \
-d '{
"entityType": "dataset",
"path": [
"my_hdfs_2",
"data",
"loans",
"acquisition-mini"
],
"type": "PHYSICAL_DATASET",
"format": {
"type": "Text",
"fieldDelimiter": "|",
"lineDelimiter": "\n",
"escape": "\"",
"skipFirstLine": false,
"extractHeader": false,
"trimHeader": false,
"autoGenerateColumnNames": true
}
}'
Python
Python request exampleimport requests
url = "http://localhost:9047/api/v3/catalog/dremio%3A%2Fmy_hdfs_2%2Fdata%2Floans%2Facquisition-mini"
payload = "{\n \"entityType\": \"dataset\",\n \"path\": [\n \t\"my_hdfs_2\",\n \t\"data\",\n \t\"loans\",\n \t\"acquisition-mini\"\n \t],\n \t\n \"type\": \"PHYSICAL_DATASET\",\n \"format\": {\n \"type\": \"Text\",\n \"fieldDelimiter\": \"|\",\n \"lineDelimiter\": \"\\n\",\n \"escape\": \"\\\"\",\n \"skipFirstLine\": false,\n \"extractHeader\": false,\n \"trimHeader\": false,\n \"autoGenerateColumnNames\": true\n }\n}"
headers = {
'Authorization': "_dremioo8opojj6vn4ughkvcpalpr46d6",
'Content-Type': "application/json"
}
response = requests.request("POST", url, data=payload, headers=headers)
print(response.text)
Response
Response example{
"entityType": "dataset",
"id": "cf771ed4-8ffc-49c6-b75c-b6ce4a518289",
"type": "PHYSICAL_DATASET",
"path": [
"my_hdfs_2",
"data",
"loans",
"acquisition-mini"
],
"createdAt": "2019-03-26T18:56:57.085Z",
"tag": "0",
"format": {
"type": "Text",
"ctime": 0,
"isFolder": true,
"location": "/data/loans/acquisition-mini",
"fieldDelimiter": "|",
"skipFirstLine": false,
"extractHeader": false,
"quote": "\"",
"comment": "#",
"escape": "\"",
"lineDelimiter": "\n",
"autoGenerateColumnNames": true,
"trimHeader": false
},
"accessControlList": {
"version": "0"
},
"owner": {
"ownerId": "a430ed7f-7142-4e1f-ba7d-94173afdc9a3",
"ownerType": "USER"
},
"fields": [
{
"name": "A",
"type": {
"name": "VARCHAR"
}
},
{
"name": "B",
"type": {
"name": "VARCHAR"
}
},
{
"name": "C",
"type": {
"name": "VARCHAR"
}
},
{
"name": "D",
"type": {
"name": "VARCHAR"
}
},
{
"name": "E",
"type": {
"name": "VARCHAR"
}
},
{
"name": "F",
"type": {
"name": "VARCHAR"
}
},
{
"name": "G",
"type": {
"name": "VARCHAR"
}
},
{
"name": "H",
"type": {
"name": "VARCHAR"
}
},
{
"name": "I",
"type": {
"name": "VARCHAR"
}
},
{
"name": "J",
"type": {
"name": "VARCHAR"
}
},
{
"name": "K",
"type": {
"name": "VARCHAR"
}
},
{
"name": "L",
"type": {
"name": "VARCHAR"
}
},
{
"name": "M",
"type": {
"name": "VARCHAR"
}
},
{
"name": "N",
"type": {
"name": "VARCHAR"
}
},
{
"name": "O",
"type": {
"name": "VARCHAR"
}
},
{
"name": "P",
"type": {
"name": "VARCHAR"
}
},
{
"name": "Q",
"type": {
"name": "VARCHAR"
}
},
{
"name": "R",
"type": {
"name": "VARCHAR"
}
},
{
"name": "S",
"type": {
"name": "VARCHAR"
}
},
{
"name": "T",
"type": {
"name": "VARCHAR"
}
},
{
"name": "U",
"type": {
"name": "VARCHAR"
}
},
{
"name": "V",
"type": {
"name": "VARCHAR"
}
},
{
"name": "W",
"type": {
"name": "VARCHAR"
}
}
],
"approximateStatisticsAllowed": false
}