Databricks

Databricks

Monitor your Databricks Clusters via its multiple APIs!

Extension

Free trial Documentation

Product information
Release notes

Overview

This OneAgent Extension allows you to collect metrics from your embedded Ganglia instance, the Apache Spark APIs, and/or the Databricks API on your Databricks Cluster.

NOTE: Databricks Runtime v13+ no longer supports Ganglia, please use the Spark and Databricks API Options within the configuration.

This is intended for users who:

Have Databricks cluster(s) they would like to monitor job status' and other important job and cluster level metrics
Look to analyze uptime and autoscaling issues of your Databricks Cluster(s)

This enables you to:

Monitor both job, cluster and infrastructure metrics
Detect long upscaling times
Detect and filter Driver and Worker types

Subscribe to new releases

Copy to clipboard

Feature sets

Below is a complete list of the feature sets provided in this version. To ensure a good fit for your needs, individual feature sets can be activated and deactivated by your administrator during configuration.

Feature setsNumber of metrics included

Metric name	Metric key	Description	Unit
Executor RDD Blocks	databricks.spark.executor.rdd_blocks	Number of Resilient Distributed Dataset blocks stored in memory or disk by the executor	Count
Executor Memory Used	databricks.spark.executor.memory_used	The amount of memory currently used by the executor for execution and storage tasks	Byte
Executor Disk Used	databricks.spark.executor.disk_used	Disk used by the Spark executor	Byte
Executor Active Tasks	databricks.spark.executor.active_tasks	Total number of tasks that are currently executing on the specified executor within the Databricks Cluster	Count
Executor Failed Tasks	databricks.spark.executor.failed_tasks	Number of failed tasks on the Spark executor	Count
Executor Completed Tasks	databricks.spark.executor.completed_tasks	Number of completed tasks on the Spark Application	Count
Executor Total Tasks	databricks.spark.executor.total_tasks	Total number of tasks executed by the executor	Count
Executor Duration	databricks.spark.executor.total_duration.count	Time taken by Spark executor to complete a task	MilliSecond
Executor Input Bytes	databricks.spark.executor.total_input_bytes.count	Total number of Bytes read by a Spark task from its input source	Byte
Executor Shuffle Read	databricks.spark.executor.total_shuffle_read.count	Total data read by the executor during shuffle operations (from other executors)	Byte
Executor Shuffle Write	databricks.spark.executor.total_shuffle_write.count	Total data written by the executor during shuffle operations (to other executors)	Byte
Executor Max Memory	databricks.spark.executor.max_memory	The maximum amount of memory allocated to the executor by Spark	Byte
Executor Alive Count	databricks.spark.executor.alive_count.gauge	Number of tasks that are currently running on the Databricks Cluster	Count
Executor Dead Count	databricks.spark.executor.dead_count.gauge	Number of dead tasks on the Spark application	Count

Metric name	Metric key	Description	Unit
CPU User %	databricks.hardware.cpu.usr	Percentage of CPUs time spent on User processes	Percent
CPU Nice %	databricks.hardware.cpu.nice	Percentage of CPU time used by processes that have a positive niceness, meaning a lower priority than other tasks	Percent
CPU System %	databricks.hardware.cpu.sys	Percentage of CPUs time spent on System processes	Percent
CPU IOWait %	databricks.hardware.cpu.iowait	Percentage of time CPU spends idle while waiting for I/O operations to complete	Percent
CPU IRQ %	databricks.hardware.cpu.irq	Interrupt Request Percentage, Proportion of CPU time spent handling hardware interrupts requests	Percent
CPU Steal %	databricks.hardware.cpu.steal	Percentage of time a virtual CPU waits for physical CPU while hypervisor is servicing another virtual processor	Percent
CPU Idle %	databricks.hardware.cpu.idle	Percentage of CPU idling	Percent
Memory Used	databricks.hardware.mem.used	Total memory currently in use, including buffers and cache	Byte
Memory Total	databricks.hardware.mem.total	Total physical memory installed on the system	KiloByte
Memory Free	databricks.hardware.mem.free	Portion of memory that is completely unused and available	KiloByte
Memory Buff/Cache	databricks.hardware.mem.buff_cache	Memory used by the system for buffers and cache to improve performance	KiloByte
Memory Shared	databricks.hardware.mem.shared	Memory shared between processes	KiloByte
Memory Available	databricks.hardware.mem.available	Total amount of memory available for use by the system	KiloByte

Metric name	Metric key	Description	Unit
Job Status	databricks.spark.job.status	Current status of the job (e.g., running, succeeded, failed)	Unspecified
Job Duration	databricks.spark.job.duration	Total time taken by the job from start to finish	Second
Job Total Tasks	databricks.spark.job.total_tasks	Total number of tasks planned for the job	Count
Job Active Tasks	databricks.spark.job.active_tasks	Number of tasks currently executing within the job	Count
Job Skipped Tasks	databricks.spark.job.skipped_tasks	Number of tasks skipped due to earlier failures or optimizations	Count
Job Failed Tasks	databricks.spark.job.failed_tasks	Number of tasks that failed during job execution	Count
Job Completed Tasks	databricks.spark.job.completed_tasks	Total number of tasks that have successfully completed	Count
Job Active Stages	databricks.spark.job.active_stages	Number of stages currently running in a Spark job	Count
Job Completed Stages	databricks.spark.job.completed_stages	Total number of stages that have successfully completed	Count
Job Skipped Stages	databricks.spark.job.skipped_stages	Number of stages skipped due to earlier failures or optimizations	Count
Job Failed Stages	databricks.spark.job.failed_stages	Number of stages that failed during job execution	Unspecified
Job Count	databricks.spark.job_count.gauge	Total number of Spark jobs submitted	Count

Metric name	Metric key	Description	Unit
Stage Active Tasks	databricks.spark.job.stage.num_active_tasks	Number of tasks currently running in the stage	Count
Stage Completed Tasks	databricks.spark.job.stage.num_complete_tasks	Number of tasks that have successfully completed in the stage	Count
Stage Failed Tasks	databricks.spark.job.stage.num_failed_tasks	Number of tasks that failed during execution in the stage	Count
Stage Killed Tasks	databricks.spark.job.stage.num_killed_tasks	Number of tasks that were killed (e.g., due to job cancellation or speculative execution)	Count
Stage Executor Run Time	databricks.spark.job.stage.executor_run_time	Total time executors spent running tasks in the stage	MilliSecond
Stage Input Bytes	databricks.spark.job.stage.input_bytes	Total number of bytes read from input sources in the stage	Byte
Stage Input Records	databricks.spark.job.stage.input_records	Total number of records read from input sources in the stage	Count
Stage Output Bytes	databricks.spark.job.stage.output_bytes	Total number of bytes written to output destinations in the stage	Byte
Stage Output Records	databricks.spark.job.stage.output_records	Total number of records written to output destinations in the stage	Count
Stage Shuffle Read Bytes	databricks.spark.job.stage.shuffle_read_bytes	Total bytes read from other executors during shuffle operations	Byte
Stage Shuffle Read Records	databricks.spark.job.stage.shuffle_read_records	Total records read from other executors during shuffle operations	Count
Stage Shuffle Write Bytes	databricks.spark.job.stage.shuffle_write_bytes	Total bytes written to other executors during shuffle operations	Byte
Stage Shuffle Write Records	databricks.spark.job.stage.shuffle_write_records	Total records written to other executors during shuffle operations	Count
Stage Memory Bytes Spilled	databricks.spark.job.stage.memory_bytes_spilled	Amount of data spilled to memory due to shuffle or aggregation operations	Byte
Stage Disk Bytes Spilled	databricks.spark.job.stage.disk_bytes_spilled	Amount of data spilled to disk due to insufficient memory during task execution	Byte

Metric name	Metric key	Description	Unit
Application Count	databricks.spark.application_count.gauge	Number of apps running databricks	Count

Metric name	Metric key	Description	Unit
Streaming Batch Duration	databricks.spark.streaming.statistics.batch_duration	Time interval configured for each streaming batch	MilliSecond
Streaming Receivers	databricks.spark.streaming.statistics.num_receivers	Total number of receivers configured for the streaming job	Count
Streaming Active Receivers	databricks.spark.streaming.statistics.num_active_receivers	Number of receivers actively ingesting data	Count
Streaming Inactive Receivers	databricks.spark.streaming.statistics.num_inactive_receivers	Number of receivers that are currently inactive	Count
Streaming Completed Batches	databricks.spark.streaming.statistics.num_total_completed_batches.count	Total number of batches that have been fully processed	Count
Streaming Retained Completed Batches	databricks.spark.streaming.statistics.num_retained_completed_batches.count	Number of completed batches retained in memory for monitoring or debugging	Unspecified
Streaming Active Batches	databricks.spark.streaming.statistics.num_active_batches	Number of streaming batches currently being processed	Count
Streaming Processed Records	databricks.spark.streaming.statistics.num_processed_records.count	Total number of records processed across all batches	Count
Streaming Received Records	databricks.spark.streaming.statistics.num_received_records.count	Total number of records received from all sources	Count
Streaming Avg Input Rate	databricks.spark.streaming.statistics.avg_input_rate	Average number of records received per second across batches	Byte
Streaming Avg Scheduling Delay	databricks.spark.streaming.statistics.avg_scheduling_delay	Average delay between batch creation and start of processing	MilliSecond
Streaming Avg Processing Time	databricks.spark.streaming.statistics.avg_processing_time	Average time taken to process each batch	MilliSecond
Streaming Avg Total Delay	databricks.spark.streaming.statistics.avg_total_delay	Average total delay from data ingestion to processing completion	MilliSecond

Metric name	Metric key	Description	Unit
RDD Count	databricks.spark.rdd_count.gauge	Total number of Resilient Distributed Datasets currently tracked by the Spark application	Count
RDD Partitions	databricks.spark.rdd.num_partitions	Total number of partitions across all Resilient Distributed Datasets	Count
RDD Cached Partitions	databricks.spark.rdd.num_cached_partitions	Number of Resilient Distributed Dataset partitions currently cached in memory or disk	Count
RDD Memory Used	databricks.spark.rdd.memory_used	Amount of memory used to store Resilient Distributed Dataset data	Byte
RDD Disk Used	databricks.spark.rdd.disk_used	Amount of disk space used to store Resilient Distributed Dataset data	Byte

Metric name	Metric key	Description	Unit
Databricks Cluster Upsizing Time	databricks.cluster.upsizing_time	Time spent upsizing cluster	MilliSecond

Related to Databricks

Databricks Workspace

Remotely monitor your Databricks Workspaces!

Full version history

To have more information on how to install the downloaded package, please follow the instructions on this page.

ReleaseDate

Full version history

v1.6.5

Vulnerability fix for protobuf:6.33.4 (CVE-2026-0994)

Full version history

1.6.2

Added descriptions to all metrics
Added databricks.hardware.mem.shared and databricks.hardware.mem.available to Hardware Metrics feature set, and databricks.spark.job_count.gauge to Spark Job Metrics feature set

Full version history

1.6.1

Improved error handling (Dynatrace Error Codes) and Endpoint Statuses.
NOTE : This version requires OneAgent version 1.313 or newer.

Full version history

1.6.0

DXS-3317
- Add Host Injections for Platform Screens
- Add new Platform Dashboard

Full version history

v1.5.6

DXS-3250
- Update Library Versions

Full version history

v1.5.5

New Feature Set - Hardware Metrics
DXS-1597
- Adds new configuration option - Aggregate Dimensions for Spark API Metrics
Updates to how Spark API is called
UA Screen updates
DXS-1920
- Adds retry logic to determine driver node during start up of extension
Adds ability to ingest Spark Jobs as traces
- NOTE : Depending on the number of Spark Jobs, this could be a significant amount of traces and could increase licensing costs.
Adds ability to ingest Spark Config as Log Messages

Full version history

##v1.02

Initial Release of Extensions 2.0 version of Databricks Extension
Offers Support for Ganglia APIs (Legacy), Apache Spark APIs, and Databricks APIs

Discover recent additions to Dynatrace

Problems

Detect, explain and triage problems automatically using Dynatrace Intelligence.

Logs

Explore all your logs without writing a single query.

Investigations

Fast and precise incident response on Grail data with DQL queries.

Business Flow

Track, analyze, and optimize your critical business processes.

Cost & Carbon Optimization

Track, analyze, and optimize your IT carbon footprint and public cloud costs.

Anomaly Detection

Detect and alert on anomalies and custom patterns using any DQL time series.

Analyze your data

Understand your data better with deep insights and clear visualizations.

Notebooks

Create powerful, data-driven documents for custom analytics and collaboration.

Dashboards

Transform complex data into clear visualizations with custom dashboards.

Automate your processes

Turn data and answers into actions, securely, and at scale.

Workflows

Automate tasks in your IT landscape and move towards autonomous operations.

Jira

Create, query, comment, transition, and resolve Jira tickets within workflows.

Slack

Automate Slack messaging for security incidents, attacks, remediation, and more.

Secure your cloud application

See vulnerabilities and attacks in your environment.

Security Posture Management

Detect, prioritize, and remediate security and compliance findings with SPM.

Threats & Exploits

Understand, triage, and investigate detection findings and alerts.

More resources

Observability for Developers on Cursor

Get Real time Code-Level data directly to your Cursor IDE

GitHub Copilot Coding Agent

Automate vulnerability remediation and boost developer productivity

GitHub Copilot Custom Agent

Automate your development workflows with specialized agent definitions

Observability for Developer on JetBrains

Get real-time code-level data directly to your Jetbrains IDE

Pagerduty for Dynatrace Workflows

Streamline incident management with automated Pageruty workflows

Cursor IDE

Boost developer productivity and get real-time, code-level insights into Cursor

Observability for Developers on Windsurf

Get real-time code-level data directly to your Windsurf IDE

Are you looking for something different?

We have hundreds of apps, extensions, and other technologies to customize your environment

Leverage our newest innovations of Dynatrace Saas

Kick-start your app creation

Kick-start your app creation

Whether you’re a beginner or a pro, Dynatrace Developer has the tools and support you need to create incredible apps with minimal effort.

Go to Dynatrace Developer

Upgrading from Dynatrace Managed to SaaS

Upgrading from Dynatrace Managed to SaaS

Drive innovation, speed, and agility in your organization by seamlessly and securely upgrading.

Log Management and Analytics

Log Management and Analytics

Innovate faster and more efficiently with unified log management and log analytics for actionable insights and automation.