Skip to main content
Version: current [25.x]

Monitoring

As an administrator, you can monitor logs, usage, system telemetry, jobs, and Dremio nodes.

Logs

Logs are primarily for troubleshooting issues and monitoring the health of the deployment.

note

By default, Dremio uses the following locations to write logs:

  • Tarball - <DREMIO_HOME>/log
  • RPM - /var/log/dremio

Log Types

Log TypeDescription
AuditThe audit.json file tracks all activities that users perform within Dremio. For details, see Audit Logging.
SystemThe following system logs are enabled by default:
  • access.log: HTTP access log for the Dremio web server; generated by coordinator nodes only.
  • server.gc: Garbage collection log.
  • server.log and json/server.json: Server logs generated in a text format (server.log) and json format (json/server.json). Admins can disable one of these formats.
  • server.out: Log for Dremio daemon standard out.
  • metadata_refresh.log: Log for refreshing metadata.
  • tracker.json: Tracker log.
QueryQuery logging is enabled by default. The queries.json file contains the log of completed queries; it does not include queries currently in planning or execution. You can retrieve the same information that is in queries.json using the sys.jobs_recent system table. Query logs include the following information:
  • queryId: Unique ID of the executed query.
  • queryText: SQL query text.
  • start: Start time of the query.
  • finish: End time of the query.
  • outcome: Whether the query was completed or failed.
  • username: User who executed the query.
  • commandDescription: Type of the command; may be a regular SQL query execution job or another SQL command.
The query log may contain additional information depending on your Dremio configuration.
WarningThe hive.deprecated.function.warning.log file contains warnings for Hive functions that have been deprecated. To resolve warnings that are listed in this file, replace deprecated functions with a supported function. For example, to resolve a warning that mentions NVL, replace NVL with COALESCE.

Retrieving Logs from the Dremio Console Enterprise

Retrieve logs for Kubernetes deployments in the Dremio console at Settings > Support > Download Logs.

Prerequisites

  • You must be using Dremio v24.2+. Log collection is powered by Dremio Diagnostics Collector (DDC), which is bundled with v24.2+. If you are on an earlier version, you can install DDC manually to enable the Download Logs functionality in the Dremio console.
  • You must have the EXPORT DIAGNOSTICS privilege to view Download Logs options in Settings > Support.

Downloading Logs

To download logs:

  1. In the Dremio console, navigate to Settings > Support > Download Logs and click Start collecting data.
note

You may store a maximum of three log bundles. Delete log bundles as needed to start a new log collection if you reach the maximum.

We recommend the default Light collection, which provides 7 days of logs and completed queries in the queries.json file, for troubleshooting most issues. For more complex issues, select the Standard collection, which provides 7 days of logs and 28 days of completed queries in the queries.json file.

  1. When Dremio completes log collection, the log bundle appears in a list below Start collecting data. To download a log bundle, click Download next to the applicable bundle. Log bundles are available to download for 24 hours.

Retrieving Logs Directly from Kubernetes

To retrieve logs from Kubernetes, use the container console for Amazon Elastic Container Service for Kubernetes (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). You can also use the AKS container to retrieve logs for AKS.

Using the Container Console

All logs are written to the container's console (stdout) simultaneously. These logs can be monitored using either kubectl command:

Command for viewing logs using kubectl logs
kubectl logs [-f] [container-name]

Use the -f flag to continuously print new log entries to your terminal as they are generated.

You can also write logs to a file on disk in addition to stdout. Read Writing Logs to a File for details.

Using the AKS Container

Azure provides integration with AKS clusters and Azure Log Analytics to monitor container logs. This is a standard practice that puts infrastructure in place to aggregate logs from containers into a central log store to analyze them.

AKS log monitoring is useful for the following reasons:

  • Monitoring logs across lots of pods can be overwhelming.
  • When a pod (for example, a Dremio executor) crashes and restarts, only the logs from the last pod are available.
  • If a pod is crashing regularly, the logs are lost, which makes it difficult to analyze the reasons for the crash.

For more information regarding AKS, see Azure Monitor features for Kubernetes monitoring.

Enabling Log Monitoring

You can enable log monitoring when creating an AKS cluster or after the cluster has been created.

Once logging is enabled, all your container stdout and stderr logs are collected by the infrastructure for you to analyze.

  1. While creating an AKS cluster, enable container monitoring. You can use an existing Log Analytics workspace or create a new one.
  2. In an existing AKS cluster where monitoring was not enabled during creation, go to Logs on the AKS cluster and enable it.

If you want to persist logs in the PVC, follow the instructions here.

Viewing Container Logs

To view all the container logs:

  1. Go to Monitoring > Logs.
  2. Use the filter option to see the logs from the containers that you are interested in.

Usage

Monitoring usage across your cluster makes it easier to observe patterns, analyze the resources being consumed by your data platform, and understand the impact on your users.

Catalog Usage Enterprise

Go to Settings > Monitor to view your catalog usage. When you open the Monitor page, you are directed to the Catalog Usage tab by default where you can see the following metrics:

  • Top 10 most queried datasets and how often the jobs on the dataset were accelerated

  • Top 10 most queried spaces and source folders

note

A source can be listed in the top 10 most queried spaces and source folders if the source contains a child dataset that was used in the query (for example, postgres.accounts). Queries of datasets in sub-folders (for example, s3.mybucket.iceberg_table) are classified by the sub-folder and not the source.

All datasets are assessed in the metrics on the Monitor page except for datasets in the system tables, the information schema, and home spaces.

The metrics on the Monitor page analyze only user queries. Refreshes of data reflections and metadata refreshes are excluded.

Jobs Enterprise

Go to Settings > Monitor > Jobs to open the Jobs tab and see an aggregate view of the following metrics for the jobs that are running on your cluster:

  • Total job count over the last 24 hours and the relative rate of failure/cancelation

  • Top 10 most active users based on the number of jobs they ran

  • Total number of completed and failed jobs over time

  • Jobs (completed and failed) grouped by the queue they ran on

  • Percentage of time that jobs spent in each state

  • Top 10 longest running jobs

To view all jobs and the details of specific jobs, see Viewing Jobs.

Resources Enterprise

Go to Settings > Monitor > Resources to open the Resources tab and see an aggregate view of the following metrics for the jobs and nodes running on your cluster:

  • Percentage of CPU and memory utilization for each coordinator and executor node

  • Top 10 most CPU and memory intensive jobs

  • Number of running executors

Cluster Usage

Dremio displays the number of unique users who executed jobs on that day and the number of executed jobs.

  1. Hover over Icon represents help in the side navigation bar.

  2. Click About Dremio in the menu.

  3. Click the Cluster Usage Data tab.

System Telemetry

Dremio exposes system telemetry metrics in Prometheus format by default. It is not necessary to configure an exporter to collect the metrics. Instead, you can specify the host and port number where metrics are exposed in the dremio.conf file and scrape the metrics with any Prometheus-compliant tool.

To specify the host and port number where metrics are exposed, add these two properties to the dremio.conf file:

  • services.web-admin.host: set to the desired host address (typically 0.0.0.0 or the IP address of the host where Dremio is running).
  • services.web-admin.port: set to any desired value that is greater than 1024.

For example:

Example host and port settings in dremio.conf
services.web-admin.host: "127.0.0.1"
services.web-admin.port: 9090

Restart Dremio after you update the dremio.conf file to make sure your changes take effect.

Access the exported Dremio system telemetry metrics at http://<yourHost>:<yourPort>/metrics.

For more information about Prometheus metrics, read Types of Metrics in the Prometheus documentation.