Monitoring
As an administrator, you can monitor catalog usage and jobs in the Dremio console. You can also use the Dremio APIs and SQL to retrieve information about jobs and events for the projects in your organization.
Monitoring in the Dremio Console Enterprise
The Monitor page in the Dremio console allows you to monitor usage across your project, which makes it easier to observe patterns, analyze the resources being consumed by your data platform, and understand the impact on your users. You must be a member of the ADMIN
role to access the Monitor page.
Catalog Usage
The data visualizations on the Monitor page point you to the most queried data and folders in a catalog.
Go to Settings > Monitor to view your catalog usage. When you open the Monitor page, you are directed to the Catalog Usage tab by default where you can see the following metrics:
-
A table of the top 10 most queried datasets within the specified time range, including for each the number of linked jobs, the percentage of linked jobs in which the dataset was accelerated, and the total number of reflections defined on the dataset
-
A table of the top 10 most queried source folders within the specified time range, including for each the number of linked jobs and the top users of that folder
A source can be listed in the top 10 most queried source folders if the source contains a child dataset that was used in the query (for example, postgres.accounts
). Queries of datasets in sub-folders (for example, s3.mybucket.iceberg_table
) are classified by the sub-folder and not the source.
All datasets are assessed in the metrics on the Monitor page except for datasets in the system tables and the information schema.
The metrics on the Monitor page analyze only user queries. Refreshes of data reflections and metadata refreshes are excluded.
Jobs
The data visualizations on the Monitor page show the metrics for queries executed in your project, including statistics about performance and utilization.
Go to Settings > Monitor > Jobs to open the Jobs tab and see an aggregate view of the following metrics for the jobs that are running in your project:
-
A report of today's job count and failed/canceled rate in comparison to yesterday's metrics
-
A list of the top 10 most active users within the specified time range, including the number of linked jobs for each user
-
A graph showing the total number of completed and failed jobs over time (aggregated hourly or daily)
-
A graph of all completed and failed jobs according to their engine (aggregated hourly or daily)
-
A graph of all job states showing the percentage of time consumed for each state (aggregated hourly or daily)
-
A table of the top 10 longest running jobs within the specified time range, including the linked ID, duration, user, query type, and start time of each job
To examine all jobs and the details of specific jobs, see Viewing Jobs.
You can create reports of jobs in other BI tools by leveraging the sys.project.history.jobs
table.
Monitoring with Dremio APIs and SQL
Administrators can use the Dremio APIs and SQL to retrieve information about the jobs and events in every project in the organization. This information is useful for further monitoring and analysis.
Before you begin, make sure that you are assigned to the ADMIN role for the organization whose information you want to retrieve. You also need a personal access token to make the necessary API requests.
The code examples in this section are written in Python.
The procedure below provides individual code examples for retrieving project IDs, retrieving information for jobs and events, saving query results to Parquet files, and uploading the Parquet files to an AWS S3 bucket. See the combined example for a single code example that combines all of the steps.
- Get the IDs for all projects in the organization. In the code example for this step, the
get_projects
method uses the Projects API to get the project IDs.
In the following code example, replace <personal_access_token>
with your PAT.
To use the API control plane for the EU rather than the US, replace https://api.dremio.cloud/
with https://api.eu.dremio.cloud/
.
import requests
import json
dremio_server = "https://api.dremio.cloud/"
personal_access_token = "<personal_access_token>"
headers = {
'Authorization': "Bearer " + personal_access_token,
'Content-Type': "application/json"
}
def api_get(endpoint: string) -> Response:
return requests.get(f'{dremio_server}/{endpoint}', headers=headers)
def get_projects() -> dict:
"""
Get all projects in the Dremio Cloud organization
:return: Dictionary of project IDs and project names
"""
projects = dict()
projects_response = api_get('v0/projects')
for project in projects_response.json():
projects[project['id']] = project['name']
return projects
- Run a SQL query to get the jobs or events for the project. The code examples for this step show how to use the SQL API to submit a SQL query, get all jobs during a specific period with the
get_jobs
method, and get all events in thesys.project.history.events
system table during a specific period with theget_events
method.
def api_post(endpoint: string, body=None) -> Response:
return requests.post(f'{dremio_server}/{endpoint}',
headers=headers, data=json.dumps(body))
def run_sql(project_id: string, query: string) -> string:
"""
Run a SQL query
:param project_id: project ID
:param query: SQL query
:return: query job ID
"""
query_response = api_post(f'v0/projects/{project_id}/sql', body={'sql': query})
job_id = query_response.json()['id']
return job_id
def api_post(endpoint: string, body=None) -> Response:
return requests.post(f'{dremio_server}/{endpoint}',
headers=headers, data=json.dumps(body))
def get_jobs(project_id: string, start_time: string, end_time: string) -> string:
"""
Run SQL query to get all jobs in a project during the specified time period
:param project_id: project ID
:param start_time: start timestamp (inclusive)
:param end_time: end timestamp (exclusive)
:return: query job ID
"""
query_response = api_post(f'v0/projects/{project_id}/sql', body={'sql': query})
job_id = run_sql(project_id, f'SELECT * FROM sys.project.history.jobs '
f'WHERE "submitted_ts" >= \'{start_time}\' '
f'AND "submitted_ts" < \'{end_time}\'')
return job_id
def get_events(project_id: string, start_time: string, end_time: string) -> string:
"""
Run SQL query to get all events in sys.project.history.events during the specified time period
:param project_id: project ID
:param start_time: start timestamp (inclusive)
:param end_time: end timestamp (exclusive)
:return: query job ID
"""
job_id = run_sql(project_id, f'SELECT * FROM sys.project.history.events '
f'WHERE "timestamp" >= \'{start_time}\' '
f'AND "timestamp" < \'{end_time}\'')
return job_id
- Check the status of the query to get jobs or events. In the code example for this step, the
wait_for_job_complete
method periodically checks and returns the query job state and prints out the final job status when the query is complete.
def wait_for_job_complete(project_id: string, job_id: string) -> string:
"""
Wait for a query job to complete
:param project_id: project ID
:param job_id: job ID
:return: if the job completed successfully, True; otherwise, False
"""
while True:
time.sleep(1)
job = api_get(f'v0/projects/{project_id}/job/{job_id}')
job_state = job.json()["jobState"]
if job_state == 'COMPLETED':
print("Job complete.")
break
elif job_state == 'FAILED':
print("Job failed.", job.json()['errorMessage'])
break
elif job_state == 'CANCELED':
print("Job canceled.")
break
return job_state
- Download the result for the query to get jobs or events and save it to a Parquet file. In the code example for this step, the
save_job_results_to_parquet
method downloads the query result and, if the result contains at least one row, saves the result to a single Parquet file.
def save_job_results_to_parquet(project_id: string, job_id: string,
parquet_file_name: string) -> bool:
"""
Download the query result and save it to a Parquet file
:param project_id: project ID
:param job_id: query job ID
:param parquet_file_name: file name to save the job result
:return: if the query returns more than 0 rows and parquet file is saved, True; otherwise False
"""
offset = 0
rows_downloaded = 0
rows = []
while True:
job_result = api_get(f'v0/projects/{project_id}/job/{job_id}/'
f'results/?offset={offset}&limit=500')
job_result_json = job_result.json()
row_count = job_result_json['rowCount']
rows_downloaded += len(job_result_json['rows'])
rows += job_result_json['rows']
if rows_downloaded >= row_count:
break
offset += 500
print(rows_downloaded, "rows")
if rows_downloaded > 0:
py_rows = pyarrow.array(job_result_json['rows'])
table = pyarrow.Table.from_struct_array(py_rows)
pyarrow.parquet.write_table(table, parquet_file_name)
return True
return False
- If desired, you can use the Boto3 library to upload the Parquet file to an AWS S3 bucket.
def upload_file(file_name: string, bucket: string, folder: string):
"""Upload Parquet file to an S3 bucket with Boto3
:param file_name: File to upload
:param bucket: Bucket to upload to
:param folder: Folder to upload to
:return: True if file was uploaded, else False
"""
# Upload the file
s3_client = boto3.client('s3')
try:
response = s3_client.upload_file(file_name, bucket, f'{folder}/{file_name}')
except ClientError as e:
print(e)
return False
return True
Combined Example
The following code example combines the steps above to get all jobs and events from all projects during a specific period, save the query results to Parquet files, and upload the Parquet files to an AWS S3 bucket. The parameter start
is the start timestamp (inclusive) and the parameter end
is the end timestamp (exclusive).
All jobs in each project during the specified time period are saved in an individual Parquet file with file name jobs_<project_id><start>.parquet
. All events in each project during the specified time period are saved in one Parquet file with file name events_<project_id><start>.parquet
.
def main(start: string, end: string):
"""
Get all jobs and events from all projects during the specified time period, save the results in Parquet files, and upload the files to an AWS S3 bucket.
:param start: start timestamp (inclusive, in format "YYYY-MM-DD" or "YYYY-MM-DD hh:mm:ss"
:param end: end timestamp (exclusive, in format "YYYY-MM-DD" or "YYYY-MM-DD hh:mm:ss"
"""
projects = get_projects()
print("Projects in organization:")
print(projects)
# Get jobs for each project
for project_id in projects:
print("Get jobs for project", projects[project_id])
# run query
job_id = get_jobs(project_id, start, end)
# check job status
job_state = wait_for_job_complete(project_id, job_id)
if job_state == "COMPLETED":
file_name = f'jobs_{project_id}{start}.parquet'
if save_job_results_to_parquet(project_id, job_id, file_name):
upload_file(file_name, 'S3_BUCKET_NAME', 'dremio/jobs')
for project_id in projects:
print("Get events for project", projects[project_id])
# run query
job_id = get_events(project_id, start, end)
# check job status
job_state = wait_for_job_complete(project_id, job_id)
if job_state == "COMPLETED":
file_name = f'events_{project_id}{start}.parquet'
if save_job_results_to_parquet(project_id, job_id, file_name):
upload_file(file_name, 'S3_BUCKET_NAME', 'dremio/events')
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description='Demo of collecting jobs and events from Dremio Cloud Projects')
parser.add_argument('start',
help='start timestamp (inclusive, in format "YYYY-MM-DD" or "YYYY-MM-DD hh:mm:ss")')
parser.add_argument('end',
help='end timestamp (exclusive, in format "YYYY-MM-DD" or "YYYY-MM-DD hh:mm:ss")')
args = parser.parse_args()
main(args.start, args.end)