Skip to technology filters Skip to main content
Dynatrace Hub

Extend the platform,
empower your team.

Popular searches:
Home hero bg
Databricks WorkspaceDatabricks Workspace
Databricks Workspace

Databricks Workspace

Remotely monitor your Databricks Workspaces!.

Extension
Free trialDocumentation
Databricks Cost Management DashboardDatabricks AI Gateway DashboardDatabricks Workspace Jobs in Distributed Tracing appDatabricks Job as a trace with task spans in Distributed Tracing appDatabricks Jobs DashboardDatabricks Cluster Usage dashboardDatabricks Audit Logs dashboard
  • Product information
  • Release notes

Overview

With Dynatrace, you can remotely monitor your Databricks Workspaces. This extension works in harmony with the OneAgent based Databricks extension but, is also ideal for workspaces and clusters where the OneAgent cannot be installed such as Databricks Serverless compute.

Use cases

  • Gather Databricks Job Run metrics including success rate and job duration
  • For Databricks Jobs running on All-purpose and Job Compute Clusters understand cost of these jobs (currently Azure Databricks is supported)
  • Ingest Job and Task run information as traces allowing for further analysis
  • Gather health metrics and detailed usage information from your Databricks model serving endpoints
  • Ingest billing data from Databricks to understand usage across workspaces, SKU & product category, jobs, and more
  • Get rightsizing recommendations based on resource utilization metrics collected from your Databricks clusters
  • Remotely capture Spark metrics from clusters to capture detailed information on jobs, tasks, stages, executors, and RDDs
  • Ingest audit logs from your workspaces

Get started

For more information on the installation and configuration, please see Databricks Workspace extension in the Dynatrace Documentation.

Details

Compatibility information

Note: Full details are listed in the documentation

Databricks API version 2.2 is used for the APIs below:

  • List job runs
  • Get a single job

API version 2.1 is used for the following:

  • Get cluster info

API version 2.0 is used for the following:

  • Get all serving endpoints
  • Get metrics of a serving endpoint

The following system tables are queried when ingesting model serving endpoint data:

  • system.access.workspaces_latest
  • system.serving.endpoint_usage
  • system.serving.served_entities

And billing & cost data:

  • system.access.workspaces_latest
  • system.billing.usage
  • system.billing.list_prices
  • system.lakeflow.jobs

To query any of the above system table data, the workspace must also have:

  • Unity Catalog enabled.
  • A SQL warehouse set up.
Dynatrace
Documentation
By Dynatrace
Dynatrace support center
Subscribe to new releases
Copy to clipboard

Feature sets

Below is a complete list of the feature sets provided in this version. To ensure a good fit for your needs, individual feature sets can be activated and deactivated by your administrator during configuration.

Feature setsNumber of metrics included
Metric nameMetric keyDescriptionUnit
Job Cost (Approx)databricks.job.cost-Unspecified
Metric nameMetric keyDescriptionUnit
Stage Active Tasksdatabricks.cluster.spark.job.stage.num_active_tasksNumber of tasks currently running in the stageCount
Stage Completed Tasksdatabricks.cluster.spark.job.stage.num_complete_tasksNumber of tasks that have successfully completed in the stageCount
Stage Failed Tasksdatabricks.cluster.spark.job.stage.num_failed_tasksNumber of tasks that failed during execution in the stageCount
Stage Killed Tasksdatabricks.cluster.spark.job.stage.num_killed_tasksNumber of tasks that were killed (e.g., due to job cancellation or speculative execution)Count
Stage Executor Run Timedatabricks.cluster.spark.job.stage.executor_run_timeTotal time executors spent running tasks in the stageMilliSecond
Stage Input Bytesdatabricks.cluster.spark.job.stage.input_bytesTotal number of bytes read from input sources in the stageByte
Stage Input Recordsdatabricks.cluster.spark.job.stage.input_recordsTotal number of records read from input sources in the stageCount
Stage Output Bytesdatabricks.cluster.spark.job.stage.output_bytesTotal number of bytes written to output destinations in the stageByte
Stage Output Recordsdatabricks.cluster.spark.job.stage.output_recordsTotal number of records written to output destinations in the stageCount
Stage Shuffle Read Bytesdatabricks.cluster.spark.job.stage.shuffle_read_bytesTotal bytes read from other executors during shuffle operationsByte
Stage Shuffle Read Recordsdatabricks.cluster.spark.job.stage.shuffle_read_recordsTotal records read from other executors during shuffle operationsCount
Stage Shuffle Write Bytesdatabricks.cluster.spark.job.stage.shuffle_write_bytesTotal bytes written to other executors during shuffle operationsByte
Stage Shuffle Write Recordsdatabricks.cluster.spark.job.stage.shuffle_write_recordsTotal records written to other executors during shuffle operationsCount
Stage Memory Bytes Spilleddatabricks.cluster.spark.job.stage.memory_bytes_spilledAmount of data spilled to memory due to shuffle or aggregation operationsByte
Stage Disk Bytes Spilleddatabricks.cluster.spark.job.stage.disk_bytes_spilledAmount of data spilled to disk due to insufficient memory during task executionByte
Metric nameMetric keyDescriptionUnit
Streaming Batch Durationdatabricks.cluster.spark.streaming.statistics.batch_durationTime interval configured for each streaming batchMilliSecond
Streaming Receiversdatabricks.cluster.spark.streaming.statistics.num_receiversTotal number of receivers configured for the streaming jobCount
Streaming Active Receiversdatabricks.cluster.spark.streaming.statistics.num_active_receiversNumber of receivers actively ingesting dataCount
Streaming Inactive Receiversdatabricks.cluster.spark.streaming.statistics.num_inactive_receiversNumber of receivers that are currently inactiveCount
Streaming Completed Batchesdatabricks.cluster.spark.streaming.statistics.num_total_completed_batches.countTotal number of batches that have been fully processedCount
Streaming Retained Completed Batchesdatabricks.cluster.spark.streaming.statistics.num_retained_completed_batches.countNumber of completed batches retained in memory for monitoring or debuggingUnspecified
Streaming Active Batchesdatabricks.cluster.spark.streaming.statistics.num_active_batchesNumber of streaming batches currently being processedCount
Streaming Processed Recordsdatabricks.cluster.spark.streaming.statistics.num_processed_records.countTotal number of records processed across all batchesCount
Streaming Received Recordsdatabricks.cluster.spark.streaming.statistics.num_received_records.countTotal number of records received from all sourcesCount
Streaming Avg Input Ratedatabricks.cluster.spark.streaming.statistics.avg_input_rateAverage number of records received per second across batchesByte
Streaming Avg Scheduling Delaydatabricks.cluster.spark.streaming.statistics.avg_scheduling_delayAverage delay between batch creation and start of processingMilliSecond
Streaming Avg Processing Timedatabricks.cluster.spark.streaming.statistics.avg_processing_timeAverage time taken to process each batchMilliSecond
Streaming Avg Total Delaydatabricks.cluster.spark.streaming.statistics.avg_total_delayAverage total delay from data ingestion to processing completionMilliSecond
Metric nameMetric keyDescriptionUnit
RDD Countdatabricks.cluster.spark.rdd_count.gaugeTotal number of Resilient Distributed Datasets currently tracked by the Spark applicationCount
RDD Partitionsdatabricks.cluster.spark.rdd.num_partitionsTotal number of partitions across all Resilient Distributed DatasetsCount
RDD Cached Partitionsdatabricks.cluster.spark.rdd.num_cached_partitionsNumber of Resilient Distributed Dataset partitions currently cached in memory or diskCount
RDD Memory Useddatabricks.cluster.spark.rdd.memory_usedAmount of memory used to store Resilient Distributed Dataset dataByte
RDD Disk Useddatabricks.cluster.spark.rdd.disk_usedAmount of disk space used to store Resilient Distributed Dataset dataByte
Metric nameMetric keyDescriptionUnit
Job Statusdatabricks.cluster.spark.job.statusCurrent status of the job (e.g., running, succeeded, failed)Unspecified
Job Durationdatabricks.cluster.spark.job.durationTotal time taken by the job from start to finishSecond
Job Total Tasksdatabricks.cluster.spark.job.total_tasksTotal number of tasks planned for the jobCount
Job Active Tasksdatabricks.cluster.spark.job.active_tasksNumber of tasks currently executing within the jobCount
Job Skipped Tasksdatabricks.cluster.spark.job.skipped_tasksNumber of tasks skipped due to earlier failures or optimizationsCount
Job Failed Tasksdatabricks.cluster.spark.job.failed_tasksNumber of tasks that failed during job executionCount
Job Completed Tasksdatabricks.cluster.spark.job.completed_tasksTotal number of tasks that have successfully completedCount
Job Active Stagesdatabricks.cluster.spark.job.active_stagesNumber of stages currently running in a Spark jobCount
Job Completed Stagesdatabricks.cluster.spark.job.completed_stagesTotal number of stages that have successfully completedCount
Job Skipped Stagesdatabricks.cluster.spark.job.skipped_stagesNumber of stages skipped due to earlier failures or optimizationsCount
Job Failed Stagesdatabricks.cluster.spark.job.failed_stagesNumber of stages that failed during job executionUnspecified
Job Countdatabricks.cluster.spark.job_count.gaugeTotal number of Spark jobs submittedCount
Metric nameMetric keyDescriptionUnit
Job Run Durationdatabricks.job.duration.run-MilliSecond
Job Success Ratedatabricks.job.success_rate-Percent
Job Runs Countdatabricks.job.runs-Count
Metric nameMetric keyDescriptionUnit
Cluster CPU System Percentagedatabricks.compute.cpu.systemPercentage of time the CPU spent in system mode.Percent
Cluster CPU User Percentagedatabricks.compute.cpu.userPercentage of time the CPU spent in userland.Percent
Cluster CPU Wait Percentagedatabricks.compute.cpu.waitPercentage of time the CPU spent waiting for I/O.Percent
Cluster CPU Total Percentagedatabricks.compute.cpu.totalPercentage of time the CPU spent in total (including system and user time).Percent
Cluster Memory Usage Percentagedatabricks.compute.memory.usedPercentage of the compute's memory that was used during the time period (including memory used by background processes running on the compute).Percent
Cluster Memory Swap Percentagedatabricks.compute.memory.swapPercentage of memory usage attributed to memory swap.Percent
Cluster Network Sent Bytesdatabricks.compute.network.sentThe number of bytes sent out in network traffic.Byte
Cluster Network Received Bytesdatabricks.compute.network.receivedThe number of received bytes from network traffic.Byte
Metric nameMetric keyDescriptionUnit
Model Serving Endpoint Memory Usage Percentagedatabricks.model_endpoint.mem_usage_percentage-Percent
Model Serving Endpoint CPU Usage Percentagedatabricks.model_endpoint.cpu_usage_percentage-Percent
Model Serving Endpoint Request Count Totaldatabricks.model_endpoint.request_count_total-Count
Model Serving Endpoint Request 5xx Count Totaldatabricks.model_endpoint.request_5xx_count_total-Count
Model Serving Endpoint Provisioned Concurrent Requests Totaldatabricks.model_endpoint.provisioned_concurrent_requests_total-Count
Model Serving Endpoint Request 4xx Count Totaldatabricks.model_endpoint.request_4xx_count_total-Count
Model Serving Endpoint GPU Usage Percentagedatabricks.model_endpoint.gpu_usage_percentage-Percent
Model Serving Endpoint GPU Memory Usage Percentagedatabricks.model_endpoint.gpu_memory_usage_percentage-Percent
Model Serving Endpoint Average Request Latencydatabricks.model_endpoint.request_latency_ms_avg-MilliSecond
Model Serving Endpoint P99 Request Latencydatabricks.model_endpoint.request_latency_ms_p99-MilliSecond
Model Serving Endpoint P95 Request Latencydatabricks.model_endpoint.request_latency_ms_p95-MilliSecond
Metric nameMetric keyDescriptionUnit
Job Setup Durationdatabricks.job.duration.setup-MilliSecond
Job Execution Durationdatabricks.job.duration.execution-MilliSecond
Job Cleanup Durationdatabricks.job.duration.cleanup-MilliSecond
Job Queue Durationdatabricks.job.duration.queue-MilliSecond
Metric nameMetric keyDescriptionUnit
Executor RDD Blocksdatabricks.cluster.spark.executor.rdd_blocksNumber of Resilient Distributed Dataset blocks stored in memory or disk by the executorCount
Executor Memory Useddatabricks.cluster.spark.executor.memory_usedThe amount of memory currently used by the executor for execution and storage tasksByte
Executor Disk Useddatabricks.cluster.spark.executor.disk_usedDisk used by the Spark executorByte
Executor Active Tasksdatabricks.cluster.spark.executor.active_tasksTotal number of tasks that are currently executing on the specified executor within the Databricks ClusterCount
Executor Failed Tasksdatabricks.cluster.spark.executor.failed_tasksNumber of failed tasks on the Spark executorCount
Executor Completed Tasksdatabricks.cluster.spark.executor.completed_tasksNumber of completed tasks on the Spark ApplicationCount
Executor Total Tasksdatabricks.cluster.spark.executor.total_tasksTotal number of tasks executed by the executorCount
Executor Durationdatabricks.cluster.spark.executor.total_duration.countTime taken by Spark executor to complete a taskMilliSecond
Executor Input Bytesdatabricks.cluster.spark.executor.total_input_bytes.countTotal number of Bytes read by a Spark task from its input sourceByte
Executor Shuffle Readdatabricks.cluster.spark.executor.total_shuffle_read.countTotal data read by the executor during shuffle operations (from other executors)Byte
Executor Shuffle Writedatabricks.cluster.spark.executor.total_shuffle_write.countTotal data written by the executor during shuffle operations (to other executors)Byte
Executor Max Memorydatabricks.cluster.spark.executor.max_memoryThe maximum amount of memory allocated to the executor by SparkByte
Executor Alive Countdatabricks.cluster.spark.executor.alive_count.gaugeNumber of tasks that are currently running on the Databricks ClusterCount
Executor Dead Countdatabricks.cluster.spark.executor.dead_count.gaugeNumber of dead tasks on the Spark applicationCount

Related to Databricks Workspace

Databricks logo

Databricks

Monitor your Databricks Clusters via its multiple APIs!.

Full version history

To have more information on how to install the downloaded package, please follow the instructions on this page.
ReleaseDate

Full version history

⚠️Breaking change

Upgrading existing monitoring configurations from previous versions to this version will not be possible and will require recreating those monitoring configurations. New monitoring configurations will not be affected.

✨New in this version:

  • Ingest audit logs from your workspaces. Includes new Databricks Audit Logs dashboard
    • Enable the Ingest Audit Logs toggle in your extension configurations
  • Report resource utilization metrics for your clusters and get rightsizing recommendations with new Databricks Cluster Details dashboard
    • Enable the Monitor cluster resource utilization toggle, and the Databricks Resource Utilization Metrics feature set in your extension configurations
  • Remotely ingest Spark metrics from your clusters to capture detailed information on jobs, tasks, stages, executors, and RDDs
    • Enable the Call Spark API toggle, and the Spark.* related feature sets in your extension configurations
  • Improvements made to Databricks Job Runs dashboard with additional charts, links to traces for each job, filtering by tag, dashboard timeframe passed to Distributed Tracing app
  • Jobs now tied to the clusters they run on and can be viewed on each cluster screen
  • New databricks.job.runs metric to report count of job runs
  • Improvements to billing data to break down costs by specific resource (Notebook, Pipeline, Cluster, etc.)
  • Configurable polling interval and timeout for system table queries
  • For jobs triggered as a one-time run, exclude the job ID from the reported traces to avoid hitting endpoint limits
  • Configurable demo mode added to preview extension dashboards populated with sample data
  • Feature set metadata and recommendations added

See the extension Documentation page for requirements and setup instructions to get started with these new features.

Full version history

⚠️Breaking change

Upgrading existing monitoring configurations from previous versions to this version will not be possible and will require recreating those monitoring configurations. New monitoring configurations will not be affected.

✨New in this version:

  • Monitor model serving endpoint usage and billing data from your workspaces with new Gen3 dashboards provided for analysis. See documentation for how to get started collecting this data.
  • Added the option to report running/active jobs as logs.
  • Added the option to report job tags as metric dimensions and trace attributes.
  • Support added for OAuth when querying Databricks APIs and system tables.
  • Credential Vault support added for all secrets provided in the configuration.
  • Updated Jobs API from v2.1 to v2.2.
  • Db.job.* attributes now added to all task spans.
  • Updated all Gen3 dashboard links to point to new entity pages in the Infrastructure and Operations app.
  • Databricks workspace name added to trace service name.

Full version history

v1.3.11

  • Vulnerability fix for protobuf:6.33.4 (CVE-2026-0994)

Full version history

1.3.9

  • DXS-3787
    • Update classic entity screen to remove optional dimension preventing data from being shown

Full version history

1.3.4

  • DXS-3317

    • Add Platform Dashboard
    • Add new Workspace Entity
    • Add Platform Screen
    • Add dt.security_context attribute
  • Updated how auto-detection of trace endpoint URL is done

  • Updated activation schema to allow for custom trace endpoint URL

    • Added custom Root CA path as an optional field
  • Fixes for job status metric and reporting for traces

Full version history

v1.0.2

  • DXS-3253
    • Update Library Versions

Full version history

V1.0.1

  • Initial version with updated Platform Dashboard link
Dynatrace Hub
Get data into DynatraceBuild your own app
Dynatrace Intelligence - Agentic Operations SystemThe Dynatrace Agentic AI ecosystem
All (898)Log Management and AnalyticsKubernetesAI and LLM ObservabilityInfrastructure ObservabilitySoftware DeliveryApplication ObservabilityApplication SecurityBusiness ObservabilityDigital Experience
Filter
Type
Built and maintained by
Deployment model
SaaS
  • SaaS
  • Managed
Partner FinderBecome a partnerDynatrace Developer

Discover recent additions to Dynatrace

Dynatrace Assist logo

Dynatrace Assist

Dynatrace Assist: Ask, analyze, and act with Dynatrace Intelligence.

Smartscape logo

Smartscape

Interactively explore and analyze topology and relationships in digital systems.

Compliance Assistant logo

Compliance Assistant

Track, manage, and automate compliance across your IT and business landscape.

Experience Vitals logo

Experience Vitals

Optimize UX with core vitals, frontend error tracking, and end-to-end visibility.

Error Inspector logo

Error Inspector

Discover, triage, and manage errors across all your frontends.

Users & Sessions logo

Users & Sessions

Discover how users and cohorts with common characteristics experience your app.

Analyze your data

Understand your data better with deep insights and clear visualizations.

Notebooks logo

Notebooks

Create powerful, data-driven documents for custom analytics and collaboration.

Dashboards logo

Dashboards

Transform complex data into clear visualizations with custom dashboards.

Investigations logo

Investigations

Fast and precise incident response on Grail data with DQL queries.

Smartscape logo

Smartscape

Interactively explore and analyze topology and relationships in digital systems.

Logs logo

Logs

Explore all your logs without writing a single query.

Problems logo

Problems

Detect, explain and triage problems automatically using Dynatrace Intelligence.

Automate your processes

Turn data and answers into actions, securely, and at scale.

Workflows logo

Workflows

Automate tasks in your IT landscape and move towards autonomous operations.

Jira logo

Jira

Create, query, comment, transition, and resolve Jira tickets within workflows.

Slack logo

Slack

Automate Slack messaging for security incidents, attacks, remediation, and more.

Secure your cloud application

See vulnerabilities and attacks in your environment.

Security Posture Management logo

Security Posture Management

Detect, prioritize, and remediate security and compliance findings with SPM.

Threats & Exploits logo

Threats & Exploits

Understand, triage, and investigate detection findings and alerts.

More resources

Observability for Developers on Cursor logo

Observability for Developers on Cursor

Get Real time Code-Level data directly to your Cursor IDE.

Documents logo

Documents

Manage Dashboards, Notebooks and other documents in your Dynatrace environment.

GitHub Copilot Coding Agent logo

GitHub Copilot Coding Agent

Automate vulnerability remediation and boost developer productivity.

GitHub Copilot Custom Agent logo

GitHub Copilot Custom Agent

Automate your development workflows with specialized agent definitions.

Observability for Developer on JetBrains logo

Observability for Developer on JetBrains

Get real-time code-level data directly to your Jetbrains IDE.

Metrics logo

Metrics

Browse, search, and manage all your metrics in one central catalog.

Pagerduty for Dynatrace Workflows logo

Pagerduty for Dynatrace Workflows

Streamline incident management with automated Pageruty workflows.

Cursor IDE logo

Cursor IDE

Boost developer productivity and get real-time, code-level insights into Cursor.

Observability for Developers on Windsurf logo

Observability for Developers on Windsurf

Get real-time code-level data directly to your Windsurf IDE.

Are you looking for something different?

We have hundreds of apps, extensions, and other technologies to customize your environment

Leverage our newest innovations of Dynatrace Saas

Kick-start your app creation

Kick-start your app creation

Whether you’re a beginner or a pro, Dynatrace Developer has the tools and support you need to create incredible apps with minimal effort.
Go to Dynatrace Developer
Upgrading from Dynatrace Managed to SaaS

Upgrading from Dynatrace Managed to SaaS

Drive innovation, speed, and agility in your organization by seamlessly and securely upgrading.
Learn More
Log Management and Analytics

Log Management and Analytics

Innovate faster and more efficiently with unified log management and log analytics for actionable insights and automation.
Learn more