Databricks Workspace monitoring & observability

Metric name	Metric key	Description	Unit
Job Cost (Approx)	databricks.job.cost	-	Unspecified

Metric name	Metric key	Description	Unit
Stage Active Tasks	databricks.cluster.spark.job.stage.num_active_tasks	Number of tasks currently running in the stage	Count
Stage Completed Tasks	databricks.cluster.spark.job.stage.num_complete_tasks	Number of tasks that have successfully completed in the stage	Count
Stage Failed Tasks	databricks.cluster.spark.job.stage.num_failed_tasks	Number of tasks that failed during execution in the stage	Count
Stage Killed Tasks	databricks.cluster.spark.job.stage.num_killed_tasks	Number of tasks that were killed (e.g., due to job cancellation or speculative execution)	Count
Stage Executor Run Time	databricks.cluster.spark.job.stage.executor_run_time	Total time executors spent running tasks in the stage	MilliSecond
Stage Input Bytes	databricks.cluster.spark.job.stage.input_bytes	Total number of bytes read from input sources in the stage	Byte
Stage Input Records	databricks.cluster.spark.job.stage.input_records	Total number of records read from input sources in the stage	Count
Stage Output Bytes	databricks.cluster.spark.job.stage.output_bytes	Total number of bytes written to output destinations in the stage	Byte
Stage Output Records	databricks.cluster.spark.job.stage.output_records	Total number of records written to output destinations in the stage	Count
Stage Shuffle Read Bytes	databricks.cluster.spark.job.stage.shuffle_read_bytes	Total bytes read from other executors during shuffle operations	Byte
Stage Shuffle Read Records	databricks.cluster.spark.job.stage.shuffle_read_records	Total records read from other executors during shuffle operations	Count
Stage Shuffle Write Bytes	databricks.cluster.spark.job.stage.shuffle_write_bytes	Total bytes written to other executors during shuffle operations	Byte
Stage Shuffle Write Records	databricks.cluster.spark.job.stage.shuffle_write_records	Total records written to other executors during shuffle operations	Count
Stage Memory Bytes Spilled	databricks.cluster.spark.job.stage.memory_bytes_spilled	Amount of data spilled to memory due to shuffle or aggregation operations	Byte
Stage Disk Bytes Spilled	databricks.cluster.spark.job.stage.disk_bytes_spilled	Amount of data spilled to disk due to insufficient memory during task execution	Byte

Metric name	Metric key	Description	Unit
Streaming Batch Duration	databricks.cluster.spark.streaming.statistics.batch_duration	Time interval configured for each streaming batch	MilliSecond
Streaming Receivers	databricks.cluster.spark.streaming.statistics.num_receivers	Total number of receivers configured for the streaming job	Count
Streaming Active Receivers	databricks.cluster.spark.streaming.statistics.num_active_receivers	Number of receivers actively ingesting data	Count
Streaming Inactive Receivers	databricks.cluster.spark.streaming.statistics.num_inactive_receivers	Number of receivers that are currently inactive	Count
Streaming Completed Batches	databricks.cluster.spark.streaming.statistics.num_total_completed_batches.count	Total number of batches that have been fully processed	Count
Streaming Retained Completed Batches	databricks.cluster.spark.streaming.statistics.num_retained_completed_batches.count	Number of completed batches retained in memory for monitoring or debugging	Unspecified
Streaming Active Batches	databricks.cluster.spark.streaming.statistics.num_active_batches	Number of streaming batches currently being processed	Count
Streaming Processed Records	databricks.cluster.spark.streaming.statistics.num_processed_records.count	Total number of records processed across all batches	Count
Streaming Received Records	databricks.cluster.spark.streaming.statistics.num_received_records.count	Total number of records received from all sources	Count
Streaming Avg Input Rate	databricks.cluster.spark.streaming.statistics.avg_input_rate	Average number of records received per second across batches	Byte
Streaming Avg Scheduling Delay	databricks.cluster.spark.streaming.statistics.avg_scheduling_delay	Average delay between batch creation and start of processing	MilliSecond
Streaming Avg Processing Time	databricks.cluster.spark.streaming.statistics.avg_processing_time	Average time taken to process each batch	MilliSecond
Streaming Avg Total Delay	databricks.cluster.spark.streaming.statistics.avg_total_delay	Average total delay from data ingestion to processing completion	MilliSecond

Metric name	Metric key	Description	Unit
RDD Count	databricks.cluster.spark.rdd_count.gauge	Total number of Resilient Distributed Datasets currently tracked by the Spark application	Count
RDD Partitions	databricks.cluster.spark.rdd.num_partitions	Total number of partitions across all Resilient Distributed Datasets	Count
RDD Cached Partitions	databricks.cluster.spark.rdd.num_cached_partitions	Number of Resilient Distributed Dataset partitions currently cached in memory or disk	Count
RDD Memory Used	databricks.cluster.spark.rdd.memory_used	Amount of memory used to store Resilient Distributed Dataset data	Byte
RDD Disk Used	databricks.cluster.spark.rdd.disk_used	Amount of disk space used to store Resilient Distributed Dataset data	Byte

Metric name	Metric key	Description	Unit
Job Status	databricks.cluster.spark.job.status	Current status of the job (e.g., running, succeeded, failed)	Unspecified
Job Duration	databricks.cluster.spark.job.duration	Total time taken by the job from start to finish	Second
Job Total Tasks	databricks.cluster.spark.job.total_tasks	Total number of tasks planned for the job	Count
Job Active Tasks	databricks.cluster.spark.job.active_tasks	Number of tasks currently executing within the job	Count
Job Skipped Tasks	databricks.cluster.spark.job.skipped_tasks	Number of tasks skipped due to earlier failures or optimizations	Count
Job Failed Tasks	databricks.cluster.spark.job.failed_tasks	Number of tasks that failed during job execution	Count
Job Completed Tasks	databricks.cluster.spark.job.completed_tasks	Total number of tasks that have successfully completed	Count
Job Active Stages	databricks.cluster.spark.job.active_stages	Number of stages currently running in a Spark job	Count
Job Completed Stages	databricks.cluster.spark.job.completed_stages	Total number of stages that have successfully completed	Count
Job Skipped Stages	databricks.cluster.spark.job.skipped_stages	Number of stages skipped due to earlier failures or optimizations	Count
Job Failed Stages	databricks.cluster.spark.job.failed_stages	Number of stages that failed during job execution	Unspecified
Job Count	databricks.cluster.spark.job_count.gauge	Total number of Spark jobs submitted	Count

Metric name	Metric key	Description	Unit
Job Run Duration	databricks.job.duration.run	-	MilliSecond
Job Success Rate	databricks.job.success_rate	-	Percent
Job Runs Count	databricks.job.runs	-	Count

Metric name	Metric key	Description	Unit
Cluster CPU System Percentage	databricks.compute.cpu.system	Percentage of time the CPU spent in system mode.	Percent
Cluster CPU User Percentage	databricks.compute.cpu.user	Percentage of time the CPU spent in userland.	Percent
Cluster CPU Wait Percentage	databricks.compute.cpu.wait	Percentage of time the CPU spent waiting for I/O.	Percent
Cluster CPU Total Percentage	databricks.compute.cpu.total	Percentage of time the CPU spent in total (including system and user time).	Percent
Cluster Memory Usage Percentage	databricks.compute.memory.used	Percentage of the compute's memory that was used during the time period (including memory used by background processes running on the compute).	Percent
Cluster Memory Swap Percentage	databricks.compute.memory.swap	Percentage of memory usage attributed to memory swap.	Percent
Cluster Network Sent Bytes	databricks.compute.network.sent	The number of bytes sent out in network traffic.	Byte
Cluster Network Received Bytes	databricks.compute.network.received	The number of received bytes from network traffic.	Byte

Metric name	Metric key	Description	Unit
Model Serving Endpoint Memory Usage Percentage	databricks.model_endpoint.mem_usage_percentage	-	Percent
Model Serving Endpoint CPU Usage Percentage	databricks.model_endpoint.cpu_usage_percentage	-	Percent
Model Serving Endpoint Request Count Total	databricks.model_endpoint.request_count_total	-	Count
Model Serving Endpoint Request 5xx Count Total	databricks.model_endpoint.request_5xx_count_total	-	Count
Model Serving Endpoint Provisioned Concurrent Requests Total	databricks.model_endpoint.provisioned_concurrent_requests_total	-	Count
Model Serving Endpoint Request 4xx Count Total	databricks.model_endpoint.request_4xx_count_total	-	Count
Model Serving Endpoint GPU Usage Percentage	databricks.model_endpoint.gpu_usage_percentage	-	Percent
Model Serving Endpoint GPU Memory Usage Percentage	databricks.model_endpoint.gpu_memory_usage_percentage	-	Percent
Model Serving Endpoint Average Request Latency	databricks.model_endpoint.request_latency_ms_avg	-	MilliSecond
Model Serving Endpoint P99 Request Latency	databricks.model_endpoint.request_latency_ms_p99	-	MilliSecond
Model Serving Endpoint P95 Request Latency	databricks.model_endpoint.request_latency_ms_p95	-	MilliSecond

Metric name	Metric key	Description	Unit
Job Setup Duration	databricks.job.duration.setup	-	MilliSecond
Job Execution Duration	databricks.job.duration.execution	-	MilliSecond
Job Cleanup Duration	databricks.job.duration.cleanup	-	MilliSecond
Job Queue Duration	databricks.job.duration.queue	-	MilliSecond

Metric name	Metric key	Description	Unit
Executor RDD Blocks	databricks.cluster.spark.executor.rdd_blocks	Number of Resilient Distributed Dataset blocks stored in memory or disk by the executor	Count
Executor Memory Used	databricks.cluster.spark.executor.memory_used	The amount of memory currently used by the executor for execution and storage tasks	Byte
Executor Disk Used	databricks.cluster.spark.executor.disk_used	Disk used by the Spark executor	Byte
Executor Active Tasks	databricks.cluster.spark.executor.active_tasks	Total number of tasks that are currently executing on the specified executor within the Databricks Cluster	Count
Executor Failed Tasks	databricks.cluster.spark.executor.failed_tasks	Number of failed tasks on the Spark executor	Count
Executor Completed Tasks	databricks.cluster.spark.executor.completed_tasks	Number of completed tasks on the Spark Application	Count
Executor Total Tasks	databricks.cluster.spark.executor.total_tasks	Total number of tasks executed by the executor	Count
Executor Duration	databricks.cluster.spark.executor.total_duration.count	Time taken by Spark executor to complete a task	MilliSecond
Executor Input Bytes	databricks.cluster.spark.executor.total_input_bytes.count	Total number of Bytes read by a Spark task from its input source	Byte
Executor Shuffle Read	databricks.cluster.spark.executor.total_shuffle_read.count	Total data read by the executor during shuffle operations (from other executors)	Byte
Executor Shuffle Write	databricks.cluster.spark.executor.total_shuffle_write.count	Total data written by the executor during shuffle operations (to other executors)	Byte
Executor Max Memory	databricks.cluster.spark.executor.max_memory	The maximum amount of memory allocated to the executor by Spark	Byte
Executor Alive Count	databricks.cluster.spark.executor.alive_count.gauge	Number of tasks that are currently running on the Databricks Cluster	Count
Executor Dead Count	databricks.cluster.spark.executor.dead_count.gauge	Number of dead tasks on the Spark application	Count

Discover recent additions to Dynatrace

Dynatrace Assist

Dynatrace Assist: Ask, analyze, and act with Dynatrace Intelligence.

Smartscape

Interactively explore and analyze topology and relationships in digital systems.

Compliance Assistant

Track, manage, and automate compliance across your IT and business landscape.

Experience Vitals

Optimize UX with core vitals, frontend error tracking, and end-to-end visibility.

Error Inspector

Discover, triage, and manage errors across all your frontends.

Users & Sessions

Discover how users and cohorts with common characteristics experience your app.

Analyze your data

Understand your data better with deep insights and clear visualizations.

Notebooks

Create powerful, data-driven documents for custom analytics and collaboration.

Dashboards

Transform complex data into clear visualizations with custom dashboards.

Investigations

Fast and precise incident response on Grail data with DQL queries.

Smartscape

Interactively explore and analyze topology and relationships in digital systems.

Logs

Explore all your logs without writing a single query.

Problems

Detect, explain and triage problems automatically using Dynatrace Intelligence.

Automate your processes

Turn data and answers into actions, securely, and at scale.

Workflows

Automate tasks in your IT landscape and move towards autonomous operations.

Jira

Create, query, comment, transition, and resolve Jira tickets within workflows.

Slack

Automate Slack messaging for security incidents, attacks, remediation, and more.

Secure your cloud application

See vulnerabilities and attacks in your environment.

Security Posture Management

Detect, prioritize, and remediate security and compliance findings with SPM.

Threats & Exploits

Understand, triage, and investigate detection findings and alerts.

More resources

Observability for Developers on Cursor

Get Real time Code-Level data directly to your Cursor IDE.

Documents

Manage Dashboards, Notebooks and other documents in your Dynatrace environment.

GitHub Copilot Coding Agent

Automate vulnerability remediation and boost developer productivity.

GitHub Copilot Custom Agent

Automate your development workflows with specialized agent definitions.

Observability for Developer on JetBrains

Get real-time code-level data directly to your Jetbrains IDE.

Metrics

Browse, search, and manage all your metrics in one central catalog.

Pagerduty for Dynatrace Workflows

Streamline incident management with automated Pageruty workflows.

Cursor IDE

Boost developer productivity and get real-time, code-level insights into Cursor.

Observability for Developers on Windsurf

Get real-time code-level data directly to your Windsurf IDE.

Are you looking for something different?

We have hundreds of apps, extensions, and other technologies to customize your environment

Leverage our newest innovations of Dynatrace Saas

Provided by

Resources

Support

Full version history

Full version history

Full version history

v1.3.11

Full version history

1.3.9

Full version history

1.3.4

Full version history

v1.0.2

Full version history

V1.0.1