Skip to technology filters Skip to main content
Dynatrace Hub

Extend the platform,
empower your team.

Popular searches:
Home hero bg
DatabricksDatabricks
Databricks

Databricks

Monitor your Databricks Clusters via its multiple APIs!

Extension
Free trialDocumentation
Overview MetricsDatabricks Executor MetricsConfiguration via TagsDatabricks Config Options
  • Product information
  • Release notes

Overview

This OneAgent Extension allows you to collect metrics from your embedded Ganglia instance, the Apache Spark APIs, and/or the Databricks API on your Databricks Cluster.

NOTE: Databricks Runtime v13+ no longer supports Ganglia, please use the Spark and Databricks API Options within the configuration.

This is intended for users who:

  • Have Databricks cluster(s) they would like to monitor job status' and other important job and cluster level metrics

  • Look to analyze uptime and autoscaling issues of your Databricks Cluster(s)

This enables you to:

  • Monitor both job, cluster and infrastructure metrics
  • Detect long upscaling times
  • Detect and filter Driver and Worker types
Dynatrace
Documentation
By Dynatrace
Dynatrace support center
Subscribe to new releases
Copy to clipboard

Extension content

Content typeNumber of items included
screen injections
18
screen metric tables
1
metric metadata
74
screen chart groups
16
document dashboard
1
screen dql table
1
dashboards
1

Feature sets

Below is a complete list of the feature sets provided in this version. To ensure a good fit for your needs, individual feature sets can be activated and deactivated by your administrator during configuration.

Feature setsNumber of metrics included
Metric nameMetric keyDescriptionUnit
RDD Countdatabricks.spark.rdd_count.gaugeTotal number of Resilient Distributed Datasets currently tracked by the Spark applicationCount
RDD Partitionsdatabricks.spark.rdd.num_partitionsTotal number of partitions across all Resilient Distributed DatasetsCount
RDD Cached Partitionsdatabricks.spark.rdd.num_cached_partitionsNumber of Resilient Distributed Dataset partitions currently cached in memory or diskCount
RDD Memory Useddatabricks.spark.rdd.memory_usedAmount of memory used to store Resilient Distributed Dataset dataByte
RDD Disk Useddatabricks.spark.rdd.disk_usedAmount of disk space used to store Resilient Distributed Dataset dataByte
Metric nameMetric keyDescriptionUnit
Streaming Batch Durationdatabricks.spark.streaming.statistics.batch_durationTime interval configured for each streaming batchMilliSecond
Streaming Receiversdatabricks.spark.streaming.statistics.num_receiversTotal number of receivers configured for the streaming jobCount
Streaming Active Receiversdatabricks.spark.streaming.statistics.num_active_receiversNumber of receivers actively ingesting dataCount
Streaming Inactive Receiversdatabricks.spark.streaming.statistics.num_inactive_receiversNumber of receivers that are currently inactiveCount
Streaming Completed Batchesdatabricks.spark.streaming.statistics.num_total_completed_batches.countTotal number of batches that have been fully processedCount
Streaming Retained Completed Batchesdatabricks.spark.streaming.statistics.num_retained_completed_batches.countNumber of completed batches retained in memory for monitoring or debuggingUnspecified
Streaming Active Batchesdatabricks.spark.streaming.statistics.num_active_batchesNumber of streaming batches currently being processedCount
Streaming Processed Recordsdatabricks.spark.streaming.statistics.num_processed_records.countTotal number of records processed across all batchesCount
Streaming Received Recordsdatabricks.spark.streaming.statistics.num_received_records.countTotal number of records received from all sourcesCount
Streaming Avg Input Ratedatabricks.spark.streaming.statistics.avg_input_rateAverage number of records received per second across batchesByte
Streaming Avg Scheduling Delaydatabricks.spark.streaming.statistics.avg_scheduling_delayAverage delay between batch creation and start of processingMilliSecond
Streaming Avg Processing Timedatabricks.spark.streaming.statistics.avg_processing_timeAverage time taken to process each batchMilliSecond
Streaming Avg Total Delaydatabricks.spark.streaming.statistics.avg_total_delayAverage total delay from data ingestion to processing completionMilliSecond
Metric nameMetric keyDescriptionUnit
Application Countdatabricks.spark.application_count.gaugeNumber of apps running databricksCount
Metric nameMetric keyDescriptionUnit
Stage Active Tasksdatabricks.spark.job.stage.num_active_tasksNumber of tasks currently running in the stageCount
Stage Completed Tasksdatabricks.spark.job.stage.num_complete_tasksNumber of tasks that have successfully completed in the stageCount
Stage Failed Tasksdatabricks.spark.job.stage.num_failed_tasksNumber of tasks that failed during execution in the stageCount
Stage Killed Tasksdatabricks.spark.job.stage.num_killed_tasksNumber of tasks that were killed (e.g., due to job cancellation or speculative execution)Count
Stage Executor Run Timedatabricks.spark.job.stage.executor_run_timeTotal time executors spent running tasks in the stageMilliSecond
Stage Input Bytesdatabricks.spark.job.stage.input_bytesTotal number of bytes read from input sources in the stageByte
Stage Input Recordsdatabricks.spark.job.stage.input_recordsTotal number of records read from input sources in the stageCount
Stage Output Bytesdatabricks.spark.job.stage.output_bytesTotal number of bytes written to output destinations in the stageByte
Stage Output Recordsdatabricks.spark.job.stage.output_recordsTotal number of records written to output destinations in the stageCount
Stage Shuffle Read Bytesdatabricks.spark.job.stage.shuffle_read_bytesTotal bytes read from other executors during shuffle operationsByte
Stage Shuffle Read Recordsdatabricks.spark.job.stage.shuffle_read_recordsTotal records read from other executors during shuffle operationsCount
Stage Shuffle Write Bytesdatabricks.spark.job.stage.shuffle_write_bytesTotal bytes written to other executors during shuffle operationsByte
Stage Shuffle Write Recordsdatabricks.spark.job.stage.shuffle_write_recordsTotal records written to other executors during shuffle operationsCount
Stage Memory Bytes Spilleddatabricks.spark.job.stage.memory_bytes_spilledAmount of data spilled to memory due to shuffle or aggregation operationsByte
Stage Disk Bytes Spilleddatabricks.spark.job.stage.disk_bytes_spilledAmount of data spilled to disk due to insufficient memory during task executionByte
Metric nameMetric keyDescriptionUnit
Job Statusdatabricks.spark.job.statusCurrent status of the job (e.g., running, succeeded, failed)Unspecified
Job Durationdatabricks.spark.job.durationTotal time taken by the job from start to finishSecond
Job Total Tasksdatabricks.spark.job.total_tasksTotal number of tasks planned for the jobCount
Job Active Tasksdatabricks.spark.job.active_tasksNumber of tasks currently executing within the jobCount
Job Skipped Tasksdatabricks.spark.job.skipped_tasksNumber of tasks skipped due to earlier failures or optimizationsCount
Job Failed Tasksdatabricks.spark.job.failed_tasksNumber of tasks that failed during job executionCount
Job Completed Tasksdatabricks.spark.job.completed_tasksTotal number of tasks that have successfully completedCount
Job Active Stagesdatabricks.spark.job.active_stagesNumber of stages currently running in a Spark jobCount
Job Completed Stagesdatabricks.spark.job.completed_stagesTotal number of stages that have successfully completedCount
Job Skipped Stagesdatabricks.spark.job.skipped_stagesNumber of stages skipped due to earlier failures or optimizationsCount
Job Failed Stagesdatabricks.spark.job.failed_stagesNumber of stages that failed during job executionUnspecified
Job Countdatabricks.spark.job_count.gaugeTotal number of Spark jobs submittedCount
Metric nameMetric keyDescriptionUnit
CPU User %databricks.hardware.cpu.usrPercentage of CPUs time spent on User processesPercent
CPU Nice %databricks.hardware.cpu.nicePercentage of CPU time used by processes that have a positive niceness, meaning a lower priority than other tasksPercent
CPU System %databricks.hardware.cpu.sysPercentage of CPUs time spent on System processesPercent
CPU IOWait %databricks.hardware.cpu.iowaitPercentage of time CPU spends idle while waiting for I/O operations to completePercent
CPU IRQ %databricks.hardware.cpu.irqInterrupt Request Percentage, Proportion of CPU time spent handling hardware interrupts requestsPercent
CPU Steal %databricks.hardware.cpu.stealPercentage of time a virtual CPU waits for physical CPU while hypervisor is servicing another virtual processorPercent
CPU Idle %databricks.hardware.cpu.idlePercentage of CPU idlingPercent
Memory Useddatabricks.hardware.mem.usedTotal memory currently in use, including buffers and cacheByte
Memory Totaldatabricks.hardware.mem.totalTotal physical memory installed on the systemKiloByte
Memory Freedatabricks.hardware.mem.freePortion of memory that is completely unused and availableKiloByte
Memory Buff/Cachedatabricks.hardware.mem.buff_cacheMemory used by the system for buffers and cache to improve performanceKiloByte
Memory Shareddatabricks.hardware.mem.sharedMemory shared between processesKiloByte
Memory Availabledatabricks.hardware.mem.availableTotal amount of memory available for use by the systemKiloByte
Metric nameMetric keyDescriptionUnit
Executor RDD Blocksdatabricks.spark.executor.rdd_blocksNumber of Resilient Distributed Dataset blocks stored in memory or disk by the executorCount
Executor Memory Useddatabricks.spark.executor.memory_usedThe amount of memory currently used by the executor for execution and storage tasksByte
Executor Disk Useddatabricks.spark.executor.disk_usedDisk used by the Spark executorByte
Executor Active Tasksdatabricks.spark.executor.active_tasksTotal number of tasks that are currently executing on the specified executor within the Databricks ClusterCount
Executor Failed Tasksdatabricks.spark.executor.failed_tasksNumber of failed tasks on the Spark executorCount
Executor Completed Tasksdatabricks.spark.executor.completed_tasksNumber of completed tasks on the Spark ApplicationCount
Executor Total Tasksdatabricks.spark.executor.total_tasksTotal number of tasks executed by the executorCount
Executor Durationdatabricks.spark.executor.total_duration.countTime taken by Spark executor to complete a taskMilliSecond
Executor Input Bytesdatabricks.spark.executor.total_input_bytes.countTotal number of Bytes read by a Spark task from its input sourceByte
Executor Shuffle Readdatabricks.spark.executor.total_shuffle_read.countTotal data read by the executor during shuffle operations (from other executors)Byte
Executor Shuffle Writedatabricks.spark.executor.total_shuffle_write.countTotal data written by the executor during shuffle operations (to other executors)Byte
Executor Max Memorydatabricks.spark.executor.max_memoryThe maximum amount of memory allocated to the executor by SparkByte
Executor Alive Countdatabricks.spark.executor.alive_count.gaugeNumber of tasks that are currently running on the Databricks ClusterCount
Executor Dead Countdatabricks.spark.executor.dead_count.gaugeNumber of dead tasks on the Spark applicationCount
Metric nameMetric keyDescriptionUnit
Databricks Cluster Upsizing Timedatabricks.cluster.upsizing_timeTime spent upsizing clusterMilliSecond

Related to Databricks

Databricks Workspace logo

Databricks Workspace

Remotely monitor your Databricks Workspaces!

Full version history

To have more information on how to install the downloaded package, please follow the instructions on this page.
ReleaseDate

Full version history

1.6.2

  • Added descriptions to all metrics
  • Added databricks.hardware.mem.shared and databricks.hardware.mem.available to Hardware Metrics feature set, and databricks.spark.job_count.gauge to Spark Job Metrics feature set

Full version history

1.6.1

  • Improved error handling (Dynatrace Error Codes) and Endpoint Statuses.
  • NOTE : This version requires OneAgent version 1.313 or newer.

Full version history

1.6.0

  • DXS-3317
    • Add Host Injections for Platform Screens
    • Add new Platform Dashboard

Full version history

v1.5.6

  • DXS-3250
    • Update Library Versions

Full version history

v1.5.5

  • New Feature Set - Hardware Metrics

  • DXS-1597

    • Adds new configuration option - Aggregate Dimensions for Spark API Metrics
  • Updates to how Spark API is called

  • UA Screen updates

  • DXS-1920

    • Adds retry logic to determine driver node during start up of extension
  • Adds ability to ingest Spark Jobs as traces

    • NOTE : Depending on the number of Spark Jobs, this could be a significant amount of traces and could increase licensing costs.
  • Adds ability to ingest Spark Config as Log Messages

Full version history

##v1.02

  • Initial Release of Extensions 2.0 version of Databricks Extension
  • Offers Support for Ganglia APIs (Legacy), Apache Spark APIs, and Databricks APIs
Dynatrace Hub
Hub HomeGet data into DynatraceBuild your own app
Log Management and AnalyticsKubernetesAI and LLM ObservabilityInfrastructure ObservabilitySoftware DeliveryApplication ObservabilityApplication SecurityDigital ExperienceBusiness Observability
Filter
Type
Built and maintained by
Deployment model
SaaS
  • SaaS
  • Managed
Partner FinderBecome a partnerDynatrace Developer

All

194 Results filtered by:

Palo Alto firewalls logo

Palo Alto firewalls

Palo Alto extension for problems detection

Extension
Confluent Cloud (Kafka) logo

Confluent Cloud (Kafka)

Remotely monitor your Confluent Cloud Kafka Clusters and other resources!

Extension
Kong - Prometheus logo

Kong - Prometheus

Monitor Prometheus metrics exposed by Kong and proxied upstream services

Extension
Nutanix Clusters logo

Nutanix Clusters

Monitor Nutanix clusters' performance, usage and availability, with Nutanix API.

Extension
Luna Network HSM Device logo

Luna Network HSM Device

Monitor your Luna Network Hardware Security Module (HSM) Devices through SNMP.

Extension
Consul Service Mesh (StatsD) logo

Consul Service Mesh (StatsD)

Extend visibility into your Consul Service Mesh instances to monitor health and improve performance.

Extension
Microsoft IIS logo

Microsoft IIS

Flexible and secure web server for hosting with Windows Server.

Extension
Kubernetes Monitoring Statistics logo

Kubernetes Monitoring Statistics

Troubleshoot your Dynatrace Kubernetes monitoring and Prometheus integration.

Extension
Snyk logo

Snyk

Ingest Snyk vulnerability findings, scans, and audit logs.

Extension
Citrix DaaS & Virtual Apps and Desktops logo

Citrix DaaS & Virtual Apps and Desktops

Gain insight into your Citrix DaaS & Virtual Apps and Desktops environments

Extension
Google Memorystore logo

Google Memorystore

Get insights into Google Memorystore service metrics collected from the Google Operations API to ensure health of your cloud infrastructure.

Extension
Databricks Workspace logo

Databricks Workspace

Remotely monitor your Databricks Workspaces!

Extension
Uptime logo

Uptime

Monitor the uptime of your hosts

Extension
UPS Device logo

UPS Device

Monitor your Uninterruptible Power Supplies (UPS) over SNMP

Extension
Google App Engine (integration) logo

Google App Engine (integration)

Insights into Google App Engine service metrics collected from Operations API

Extensioncoming soon
Traceroute logo

Traceroute

Run traceroute commands and collect step performance metrics

Extension
[Deprecated] Kubernetes PVCs logo

[Deprecated] Kubernetes PVCs

Monitor your Kubernetes persistent volume claims and alert on capacity limits.

Extension
Google Cloud Storage Transfer logo

Google Cloud Storage Transfer

Get insights into Google Cloud Storage Transfer metrics collected from the Google Operations API to ensure health of cloud infrastructure.

Extension
NVIDIA GPU logo

NVIDIA GPU

Monitor base parameters of the GPU, including load, memory and temperature

Extension
Oracle Database logo

Oracle Database

Observe, analyze and optimize the usage, health and performance of your database

Extension
Cisco ACI/APIC logo

Cisco ACI/APIC

Get insights into your Cisco Application Centric Infrastructure (ACI)

Extension
Dell iDRAC logo

Dell iDRAC

Connect to the Redfish API to get insights into your Dell iDRAC environment

Extension
Azure Managed Apache Cassandra logo

Azure Managed Apache Cassandra

Gain insights into your Azure Managed Cassandra Instance health and performance

Extension
PayShield HSM Device logo

PayShield HSM Device

Monitor PayShield Payment Hardware Security Module (HSM) Devices through SNMP.

Extension
NetApp OnTap (Remote) logo

NetApp OnTap (Remote)

Remote extension that collects NetApp OnTap metrics from the OnTap 9.6+ API.

Extension
Google Firestore in Datastore mode logo

Google Firestore in Datastore mode

Get insights into Google Firestore in Datastore mode metrics collected from the Google Operations API to ensure health of infrastructure.

Extension
Redis (2.0) logo

Redis (2.0)

Collect important additional data for your Redis instances.

Extension
PHP-FPM logo

PHP-FPM

Monitor the PHP-FPM status of your applications with this extension.

Extension
Timedrift Monitoring logo

Timedrift Monitoring

Monitor your host's NTP/Chrony Time Offset!

Extension
Apache Kafka logo

Apache Kafka

Automatic and intelligent observability with trace and metric insights.

Extension
MongoDB (local or remote monitoring) logo

MongoDB (local or remote monitoring)

Monitor your MongoDB servers either locally or remotely!

Extension
Connection Pools: C3P0 logo

Connection Pools: C3P0

Application server method of pooling and sharing connections to a database.

Extension
AWS Entities for Metric Streaming logo

AWS Entities for Metric Streaming

Analyse metrics in the context of an entity based on AWS Metric Streaming.

Extension
MongoDB Atlas logo

MongoDB Atlas

Remotely monitor your SaaS installation of MongoDB (Atlas)

Extension
Microsoft SQL Server logo

Microsoft SQL Server

Improve the health and performance monitoring of your Microsoft SQL Servers.

Extension
IBM MQ Appliance logo

IBM MQ Appliance

Monitor your IBM MQ Appliances over SNMP

Extension
Google Apigee logo

Google Apigee

Get insights into Google Apigee service metrics collected from the Google Operations API to ensure health of your cloud infrastructure.

Extension
Oracle Autonomous Database on OCI logo

Oracle Autonomous Database on OCI

Monitor health and performance of the Oracle Autonomous Database.

Extension
Google Pub/Sub Lite logo

Google Pub/Sub Lite

Get insights into Google Pub/Sub Lite service metrics collected from the Google Operations API to ensure health of the cloud infrastructure.

Extension
Infoblox DDI logo

Infoblox DDI

Monitor Infoblox DDI using SNMP

Extension
SAP HANA Database (remote monitoring) logo

SAP HANA Database (remote monitoring)

Easily understand the health and performance of your SAP HANA databases.

Extension
Connection Pools: WebSphere Liberty logo

Connection Pools: WebSphere Liberty

Application server method of pooling and sharing connections to a database.

Extension
Google Cloud Composer logo

Google Cloud Composer

Get insights into Google Cloud Composer metrics collected from the Google Operations API to ensure health of your cloud infrastructure.

Extension
Google Cloud Spanner logo

Google Cloud Spanner

Get insights into Google Cloud Spanner metrics collected from the Google Operations API to ensure health of your cloud infrastructure.

Extension
IBM i logo

IBM i

Collect performance data from your IBM i Hosts via this Remote extension.

Extension
Google reCAPTCHA Enterprise logo

Google reCAPTCHA Enterprise

Get insights into Google reCAPTCHA Enterprise metrics collected from the Google Operations API to ensure health of your cloud infrastructure

Extension
.NET logo

.NET

Automatic end-to-end observability for .NET applications and processes.

Extension
Google Cloud's operations suite logo

Google Cloud's operations suite

Get insights into Google Cloud's operations suite metrics collected from the Google Operations API to ensure health of cloud infrastructure.

Extension
Google Vertex AI logo

Google Vertex AI

Get insights into Google Vertex AI service metrics.

Extension
Oracle Exadata logo

Oracle Exadata

Monitor Oracle Exadata systems for performance, usage and availability

Extension