Skip to technology filters Skip to main content
Dynatrace Hub

Extend the platform,
empower your team.

Popular searches:
Home hero bg
DatabricksDatabricks
Databricks

Databricks

Monitor your Databricks Clusters via its multiple APIs!

Extension
Free trialDocumentation
Overview MetricsDatabricks Executor MetricsConfiguration via TagsDatabricks Config Options
  • Product information
  • Release notes

Overview

This OneAgent Extension allows you to collect metrics from your embedded Ganglia instance, the Apache Spark APIs, and/or the Databricks API on your Databricks Cluster.

NOTE: Databricks Runtime v13+ no longer supports Ganglia, please use the Spark and Databricks API Options within the configuration.

This is intended for users who:

  • Have Databricks cluster(s) they would like to monitor job status' and other important job and cluster level metrics

  • Look to analyze uptime and autoscaling issues of your Databricks Cluster(s)

This enables you to:

  • Monitor both job, cluster and infrastructure metrics
  • Detect long upscaling times
  • Detect and filter Driver and Worker types
Dynatrace
Documentation
By Dynatrace
Dynatrace support center
Subscribe to new releases
Copy to clipboard

Extension content

Content typeNumber of items included
screen injections
18
screen metric tables
1
metric metadata
74
screen chart groups
16
document dashboard
1
screen dql table
1
dashboards
1

Feature sets

Below is a complete list of the feature sets provided in this version. To ensure a good fit for your needs, individual feature sets can be activated and deactivated by your administrator during configuration.

Feature setsNumber of metrics included
Metric nameMetric keyDescriptionUnit
RDD Countdatabricks.spark.rdd_count.gaugeTotal number of Resilient Distributed Datasets currently tracked by the Spark applicationCount
RDD Partitionsdatabricks.spark.rdd.num_partitionsTotal number of partitions across all Resilient Distributed DatasetsCount
RDD Cached Partitionsdatabricks.spark.rdd.num_cached_partitionsNumber of Resilient Distributed Dataset partitions currently cached in memory or diskCount
RDD Memory Useddatabricks.spark.rdd.memory_usedAmount of memory used to store Resilient Distributed Dataset dataByte
RDD Disk Useddatabricks.spark.rdd.disk_usedAmount of disk space used to store Resilient Distributed Dataset dataByte
Metric nameMetric keyDescriptionUnit
Streaming Batch Durationdatabricks.spark.streaming.statistics.batch_durationTime interval configured for each streaming batchMilliSecond
Streaming Receiversdatabricks.spark.streaming.statistics.num_receiversTotal number of receivers configured for the streaming jobCount
Streaming Active Receiversdatabricks.spark.streaming.statistics.num_active_receiversNumber of receivers actively ingesting dataCount
Streaming Inactive Receiversdatabricks.spark.streaming.statistics.num_inactive_receiversNumber of receivers that are currently inactiveCount
Streaming Completed Batchesdatabricks.spark.streaming.statistics.num_total_completed_batches.countTotal number of batches that have been fully processedCount
Streaming Retained Completed Batchesdatabricks.spark.streaming.statistics.num_retained_completed_batches.countNumber of completed batches retained in memory for monitoring or debuggingUnspecified
Streaming Active Batchesdatabricks.spark.streaming.statistics.num_active_batchesNumber of streaming batches currently being processedCount
Streaming Processed Recordsdatabricks.spark.streaming.statistics.num_processed_records.countTotal number of records processed across all batchesCount
Streaming Received Recordsdatabricks.spark.streaming.statistics.num_received_records.countTotal number of records received from all sourcesCount
Streaming Avg Input Ratedatabricks.spark.streaming.statistics.avg_input_rateAverage number of records received per second across batchesByte
Streaming Avg Scheduling Delaydatabricks.spark.streaming.statistics.avg_scheduling_delayAverage delay between batch creation and start of processingMilliSecond
Streaming Avg Processing Timedatabricks.spark.streaming.statistics.avg_processing_timeAverage time taken to process each batchMilliSecond
Streaming Avg Total Delaydatabricks.spark.streaming.statistics.avg_total_delayAverage total delay from data ingestion to processing completionMilliSecond
Metric nameMetric keyDescriptionUnit
Application Countdatabricks.spark.application_count.gaugeNumber of apps running databricksCount
Metric nameMetric keyDescriptionUnit
Stage Active Tasksdatabricks.spark.job.stage.num_active_tasksNumber of tasks currently running in the stageCount
Stage Completed Tasksdatabricks.spark.job.stage.num_complete_tasksNumber of tasks that have successfully completed in the stageCount
Stage Failed Tasksdatabricks.spark.job.stage.num_failed_tasksNumber of tasks that failed during execution in the stageCount
Stage Killed Tasksdatabricks.spark.job.stage.num_killed_tasksNumber of tasks that were killed (e.g., due to job cancellation or speculative execution)Count
Stage Executor Run Timedatabricks.spark.job.stage.executor_run_timeTotal time executors spent running tasks in the stageMilliSecond
Stage Input Bytesdatabricks.spark.job.stage.input_bytesTotal number of bytes read from input sources in the stageByte
Stage Input Recordsdatabricks.spark.job.stage.input_recordsTotal number of records read from input sources in the stageCount
Stage Output Bytesdatabricks.spark.job.stage.output_bytesTotal number of bytes written to output destinations in the stageByte
Stage Output Recordsdatabricks.spark.job.stage.output_recordsTotal number of records written to output destinations in the stageCount
Stage Shuffle Read Bytesdatabricks.spark.job.stage.shuffle_read_bytesTotal bytes read from other executors during shuffle operationsByte
Stage Shuffle Read Recordsdatabricks.spark.job.stage.shuffle_read_recordsTotal records read from other executors during shuffle operationsCount
Stage Shuffle Write Bytesdatabricks.spark.job.stage.shuffle_write_bytesTotal bytes written to other executors during shuffle operationsByte
Stage Shuffle Write Recordsdatabricks.spark.job.stage.shuffle_write_recordsTotal records written to other executors during shuffle operationsCount
Stage Memory Bytes Spilleddatabricks.spark.job.stage.memory_bytes_spilledAmount of data spilled to memory due to shuffle or aggregation operationsByte
Stage Disk Bytes Spilleddatabricks.spark.job.stage.disk_bytes_spilledAmount of data spilled to disk due to insufficient memory during task executionByte
Metric nameMetric keyDescriptionUnit
Job Statusdatabricks.spark.job.statusCurrent status of the job (e.g., running, succeeded, failed)Unspecified
Job Durationdatabricks.spark.job.durationTotal time taken by the job from start to finishSecond
Job Total Tasksdatabricks.spark.job.total_tasksTotal number of tasks planned for the jobCount
Job Active Tasksdatabricks.spark.job.active_tasksNumber of tasks currently executing within the jobCount
Job Skipped Tasksdatabricks.spark.job.skipped_tasksNumber of tasks skipped due to earlier failures or optimizationsCount
Job Failed Tasksdatabricks.spark.job.failed_tasksNumber of tasks that failed during job executionCount
Job Completed Tasksdatabricks.spark.job.completed_tasksTotal number of tasks that have successfully completedCount
Job Active Stagesdatabricks.spark.job.active_stagesNumber of stages currently running in a Spark jobCount
Job Completed Stagesdatabricks.spark.job.completed_stagesTotal number of stages that have successfully completedCount
Job Skipped Stagesdatabricks.spark.job.skipped_stagesNumber of stages skipped due to earlier failures or optimizationsCount
Job Failed Stagesdatabricks.spark.job.failed_stagesNumber of stages that failed during job executionUnspecified
Job Countdatabricks.spark.job_count.gaugeTotal number of Spark jobs submittedCount
Metric nameMetric keyDescriptionUnit
CPU User %databricks.hardware.cpu.usrPercentage of CPUs time spent on User processesPercent
CPU Nice %databricks.hardware.cpu.nicePercentage of CPU time used by processes that have a positive niceness, meaning a lower priority than other tasksPercent
CPU System %databricks.hardware.cpu.sysPercentage of CPUs time spent on System processesPercent
CPU IOWait %databricks.hardware.cpu.iowaitPercentage of time CPU spends idle while waiting for I/O operations to completePercent
CPU IRQ %databricks.hardware.cpu.irqInterrupt Request Percentage, Proportion of CPU time spent handling hardware interrupts requestsPercent
CPU Steal %databricks.hardware.cpu.stealPercentage of time a virtual CPU waits for physical CPU while hypervisor is servicing another virtual processorPercent
CPU Idle %databricks.hardware.cpu.idlePercentage of CPU idlingPercent
Memory Useddatabricks.hardware.mem.usedTotal memory currently in use, including buffers and cacheByte
Memory Totaldatabricks.hardware.mem.totalTotal physical memory installed on the systemKiloByte
Memory Freedatabricks.hardware.mem.freePortion of memory that is completely unused and availableKiloByte
Memory Buff/Cachedatabricks.hardware.mem.buff_cacheMemory used by the system for buffers and cache to improve performanceKiloByte
Memory Shareddatabricks.hardware.mem.sharedMemory shared between processesKiloByte
Memory Availabledatabricks.hardware.mem.availableTotal amount of memory available for use by the systemKiloByte
Metric nameMetric keyDescriptionUnit
Executor RDD Blocksdatabricks.spark.executor.rdd_blocksNumber of Resilient Distributed Dataset blocks stored in memory or disk by the executorCount
Executor Memory Useddatabricks.spark.executor.memory_usedThe amount of memory currently used by the executor for execution and storage tasksByte
Executor Disk Useddatabricks.spark.executor.disk_usedDisk used by the Spark executorByte
Executor Active Tasksdatabricks.spark.executor.active_tasksTotal number of tasks that are currently executing on the specified executor within the Databricks ClusterCount
Executor Failed Tasksdatabricks.spark.executor.failed_tasksNumber of failed tasks on the Spark executorCount
Executor Completed Tasksdatabricks.spark.executor.completed_tasksNumber of completed tasks on the Spark ApplicationCount
Executor Total Tasksdatabricks.spark.executor.total_tasksTotal number of tasks executed by the executorCount
Executor Durationdatabricks.spark.executor.total_duration.countTime taken by Spark executor to complete a taskMilliSecond
Executor Input Bytesdatabricks.spark.executor.total_input_bytes.countTotal number of Bytes read by a Spark task from its input sourceByte
Executor Shuffle Readdatabricks.spark.executor.total_shuffle_read.countTotal data read by the executor during shuffle operations (from other executors)Byte
Executor Shuffle Writedatabricks.spark.executor.total_shuffle_write.countTotal data written by the executor during shuffle operations (to other executors)Byte
Executor Max Memorydatabricks.spark.executor.max_memoryThe maximum amount of memory allocated to the executor by SparkByte
Executor Alive Countdatabricks.spark.executor.alive_count.gaugeNumber of tasks that are currently running on the Databricks ClusterCount
Executor Dead Countdatabricks.spark.executor.dead_count.gaugeNumber of dead tasks on the Spark applicationCount
Metric nameMetric keyDescriptionUnit
Databricks Cluster Upsizing Timedatabricks.cluster.upsizing_timeTime spent upsizing clusterMilliSecond

Related to Databricks

Databricks Workspace logo

Databricks Workspace

Remotely monitor your Databricks Workspaces!

Full version history

To have more information on how to install the downloaded package, please follow the instructions on this page.
ReleaseDate

Full version history

1.6.2

  • Added descriptions to all metrics
  • Added databricks.hardware.mem.shared and databricks.hardware.mem.available to Hardware Metrics feature set, and databricks.spark.job_count.gauge to Spark Job Metrics feature set

Full version history

1.6.1

  • Improved error handling (Dynatrace Error Codes) and Endpoint Statuses.
  • NOTE : This version requires OneAgent version 1.313 or newer.

Full version history

1.6.0

  • DXS-3317
    • Add Host Injections for Platform Screens
    • Add new Platform Dashboard

Full version history

v1.5.6

  • DXS-3250
    • Update Library Versions

Full version history

v1.5.5

  • New Feature Set - Hardware Metrics

  • DXS-1597

    • Adds new configuration option - Aggregate Dimensions for Spark API Metrics
  • Updates to how Spark API is called

  • UA Screen updates

  • DXS-1920

    • Adds retry logic to determine driver node during start up of extension
  • Adds ability to ingest Spark Jobs as traces

    • NOTE : Depending on the number of Spark Jobs, this could be a significant amount of traces and could increase licensing costs.
  • Adds ability to ingest Spark Config as Log Messages

Full version history

##v1.02

  • Initial Release of Extensions 2.0 version of Databricks Extension
  • Offers Support for Ganglia APIs (Legacy), Apache Spark APIs, and Databricks APIs
Dynatrace Hub
Get data into DynatraceBuild your own app
All (811)Log Management and AnalyticsKubernetesAI and LLM ObservabilityInfrastructure ObservabilitySoftware DeliveryApplication ObservabilityApplication SecurityDigital ExperienceBusiness Observability
Filter
Type
Built and maintained by
Deployment model
SaaS
  • SaaS
  • Managed
Partner FinderBecome a partnerDynatrace Developer

Discover recent additions to Dynatrace

Problems logo

Problems

Analyze abnormal system behavior and performance problems detected by Davis AI.

Logs logo

Logs

Explore all your logs without writing a single query.

Security Investigator logo

Security Investigator

Fast and precise forensics for security and logs on Grail data with DQL queries.

Business Flow logo

Business Flow

Track, analyze, and optimize your critical business processes.

Cost & Carbon Optimization logo

Cost & Carbon Optimization

Track, analyze, and optimize your IT carbon footprint and public cloud costs.

Davis Anomaly Detection logo

Davis Anomaly Detection

Detect anomalies in timeseries using the Davis AI

Analyze your data

Understand your data better with deep insights and clear visualizations.

Notebooks logo

Notebooks

Create powerful, data-driven documents for custom analytics and collaboration.

Dashboards logo

Dashboards

Transform complex data into clear visualizations with custom dashboards.

Automate your processes

Turn data and answers into actions, securely, and at scale.

Workflows logo

Workflows

Automate tasks in your IT landscape, remediate problems, and visualize processes

Jira logo

Jira

Create, query, comment, transition, and resolve Jira tickets within workflows.

Slack logo

Slack

Automate Slack messaging for security incidents, attacks, remediation, and more.

Secure your cloud application

See vulnerabilities and attacks in your environment.

Security Overview logo

Security Overview

Get a comprehensive overview of the security of your applications.

Code-Level Vulnerabilities logo

Code-Level Vulnerabilities

Detect vulnerabilities in your code in real time.

Security Posture Management logo

Security Posture Management

Detect, prioritize, and remediate security and compliance findings with SPM.

Threats & Exploits logo

Threats & Exploits

Understand, triage, and investigate detection findings and alerts.

Are you looking for something different?

We have hundreds of apps, extensions, and other technologies to customize your environment

Leverage our newest innovations of Dynatrace Saas

Kick-start your app creation

Kick-start your app creation

Whether you’re a beginner or a pro, Dynatrace Developer has the tools and support you need to create incredible apps with minimal effort.
Go to Dynatrace Developer
Upgrading from Dynatrace Managed to SaaS

Upgrading from Dynatrace Managed to SaaS

Drive innovation, speed, and agility in your organization by seamlessly and securely upgrading.
Learn More
Log Management and Analytics

Log Management and Analytics

Innovate faster and more efficiently with unified log management and log analytics for actionable insights and automation.
Learn more