Skip to technology filters Skip to main content
Dynatrace Hub

Extend the platform,
empower your team.

Popular searches:
Home hero bg
DatabricksDatabricks
Databricks

Databricks

Monitor your Databricks Clusters via its multiple APIs!

Extension
Free trial
Overview MetricsDatabricks Executor MetricsConfiguration via TagsDatabricks Config Options
  • Product information
  • Release notes

Overview

This OneAgent Extension allows you to collect metrics from your embedded Ganglia instance, the Apache Spark APIs, and/or the Databricks API on your Databricks Cluster.

NOTE: Databricks Runtime v13+ no longer supports Ganglia, please use the Spark and Databricks API Options within the configuration.

This is intended for users who:

  • Have Databricks cluster(s) they would like to monitor job status' and other important job and cluster level metrics

  • Look to analyze uptime and autoscaling issues of your Databricks Cluster(s)

This enables you to:

  • Monitor both job, cluster and infrastructure metrics
  • Detect long upscaling times
  • Detect and filter Driver and Worker types

Get started

  1. Define in the configuration which metrics you'd like to collect from your Databricks Clusters

  2. Set up a global init script on your Databricks Cluster to download the Dynatrace OneAgent

  3. Start or Restart your Databricks Cluster to enable the Dynatrace OneAgent and this extension.

Details

  1. Ensure the EEC is enabled on each host. This can be done globally from
    • Settings -> Preferences -> Extension Execution Controller (/ui/settings/builtin:eec.local)

and turning on the first two options.

  1. Create Databricks API Token from inside your Databricks Cluster

    • User Settings -> Create API Token
  2. Copy your Databricks URL

  3. Copy Linux OA Installation wget command from

    • Deploy Dynatrace -> Start Installation button -> Linux button -> Enter or Create Paas Token (#install/agentlinux;gf=all)

**NOTE: Databricks Clusters can go up and down quickly causing multiple HOST entities within Dynatrace. Databricks reuses IP addresses so if you'd like to have the same HOST entities for your clusters you can add this flag --set-host-id-source="ip-addresses" to the OneAgent installation command in your Global Init Script. For Example :

/bin/sh Dynatrace-OneAgent-Linux.sh --set-infra-only=true --set-app-log-content-access=true --set-host-id-source="ip-addresses"

Configuration for Apache Spark & Databricks API Metrics (Recommended)

  1. Set up Global Init Script on Databricks Cluster

    • Change Dynatrace Tenant & API Token values
    • NOTE: If your Databricks Cluster does not have network access to your Dynatrace Cluster or ActiveGate, the Dynatrace-OneAgent-Linux.sh file can be manually uploaded to you Databricks DBFS and the script below can be modified to use those locations instead of using the wget command.
    #!/usr/bin/env bash
    
    wget  -O Dynatrace-OneAgent-Linux.sh "https://<TENANT>.live.dynatrace.com/api/v1/deployment/installer/agent/unix/default/latest?arch=x86&flavor=default" --header="Authorization: Api-Token <Installer API-TOKEN>"
    /bin/sh Dynatrace-OneAgent-Linux.sh --set-infra-only=true --set-app-log-content-access=true --set-host-id-source="ip-addresses"
    
    
  2. Configure OneAgent Extension in Dynatrace Cluster from

    • Extensions -> Databricks (/ui/hub/ext/com.dynatrace.databricks) -> Add Monitoring Configuration -> Select Databricks Hosts ->
    • Enable Call Spark API Slider
    • Enable Call Databricks API Slider
      • Enter in your Databricks URL and User Token
  3. Select which Feature Sets of Metrics you'd like to Capture

  4. Start (or restart if you're using an existing All-purpose compute cluster) your Databricks Clusters and ensure the OneAgent is connected

  5. Verify Metrics show up on the HOST screen of your Databricks Cluster's Driver Node. All the metrics will be attached to that HOST entity.

Configuration for Ganglia (Legacy)

  1. Create Dynatrace API with ReadConfig Permissions

  2. Set up Global Init Script on Databricks Cluster

    • Change Dynatrace Tenant & API Token values
    • Change DB_WS_URL & DB_WS_TOKEN Values (from steps above)
    • NOTE: If your Databricks Cluster does not have network access to your Dynatrace Cluster or ActiveGate, the OneAgent.sh and extension zip file can be manually uploaded to you Databricks DBFS and the script below can be modified to use those locations instead of using the wget commands.
    #!/usr/bin/env bash
    
    wget  -O Dynatrace-OneAgent-Linux.sh "https://<TENANT>.live.dynatrace.com/api/v1/deployment/installer/agent/unix/default/latest?arch=x86&flavor=default" --header="Authorization: Api-Token <Installer API-TOKEN>"
    /bin/sh Dynatrace-OneAgent-Linux.sh --set-infra-only=true --set-app-log-content-access=true --set-host-id-source="ip-addresses"
    
    # token with 'ReadConfig' permissions
    wget -O custom_python_databricks_ganglia.zip "https://<TENANT>.live.dynatrace.com/api/config/v1/extensions/custom.python.databricks_ganglia/binary" --header="Authorization: Api-Token <ReadConfig API-TOKEN>"
    unzip custom_python_databricks_ganglia.zip -d /opt/dynatrace/oneagent/plugin_deployment/
    
    # Add Databricks Workspace URL Environment Variable
    cat <<EOF | sudo tee /etc/databricks_env
    DB_WS_URL=https://adb-XXXXXXXXX.XX.azuredatabricks.net
    DB_WS_TOKEN=dapiXXXXXXXXXXXXXXXXXXXXXXXXXXX
    EOF
    
  3. Create Dynatrace API Token with entities.read & entitie.write permissions.

  4. Configure OneAgent Extension in Dynatrace Cluster from

    • Extensions -> Databricks -> Add Monitoring Configuration -> Select Databricks Hosts -> Enable Call Ganglia API Slider (/ui/hub/ext/com.dynatrace.databricks)
    1. Enter in Dynatrace Tenant URL and API Token
    2. Select which metrics you'd like to capture from Ganglia
  5. Start (or restart if you're using an existing All-purpose compute cluster) your Databricks Clusters and ensure the OneAgent is connected

  6. Verify Metrics are showing up on the included Dashboard

Dynatrace
By Dynatrace
Dynatrace support center
Subscribe to new releases
Copy to clipboard

Extension content

Content typeNumber of items included
screen metric tables
1
metric metadata
74
dashboards
1
screen injections
9
screen chart groups
8

Feature sets

Below is a complete list of the feature sets provided in this version. To ensure a good fit for your needs, individual feature sets can be activated and deactivated by your administrator during configuration.

Feature setsNumber of metrics included
Metric nameMetric keyDescriptionUnit
Stage Active Tasksdatabricks.spark.job.stage.num_active_tasks-Count
Stage Completed Tasksdatabricks.spark.job.stage.num_complete_tasks-Count
Stage Failed Tasksdatabricks.spark.job.stage.num_failed_tasks-Count
Stage Killed Tasksdatabricks.spark.job.stage.num_killed_tasks-Count
Stage Executor Run Timedatabricks.spark.job.stage.executor_run_time-MilliSecond
Stage Input Bytesdatabricks.spark.job.stage.input_bytes-Byte
Stage Input Recordsdatabricks.spark.job.stage.input_records-Count
Stage Output Bytesdatabricks.spark.job.stage.output_bytes-Byte
Stage Output Recordsdatabricks.spark.job.stage.output_records-Count
Stage Shuffle Read Bytesdatabricks.spark.job.stage.shuffle_read_bytes-Byte
Stage Shuffle Read Recordsdatabricks.spark.job.stage.shuffle_read_records-Count
Stage Shuffle Write Bytesdatabricks.spark.job.stage.shuffle_write_bytes-Byte
Stage Shuffle Write Recordsdatabricks.spark.job.stage.shuffle_write_records-Count
Stage Memory Bytes Spilleddatabricks.spark.job.stage.memory_bytes_spilled-Byte
Stage Disk Bytes Spilleddatabricks.spark.job.stage.disk_bytes_spilled-Byte
Metric nameMetric keyDescriptionUnit
Job Statusdatabricks.spark.job.status-Unspecified
Job Durationdatabricks.spark.job.duration-Second
Job Total Tasksdatabricks.spark.job.total_tasks-Count
Job Active Tasksdatabricks.spark.job.active_tasks-Count
Job Skipped Tasksdatabricks.spark.job.skipped_tasks-Count
Job Failed Tasksdatabricks.spark.job.failed_tasks-Count
Job Completed Tasksdatabricks.spark.job.completed_tasks-Count
Job Active Stagesdatabricks.spark.job.active_stages-Count
Job Completed Stagesdatabricks.spark.job.completed_stages-Count
Job Skipped Stagesdatabricks.spark.job.skipped_stages-Count
Job Failed Stagesdatabricks.spark.job.failed_stages-Unspecified
Metric nameMetric keyDescriptionUnit
CPU User %databricks.hardware.cpu.usr-Percent
CPU Nice %databricks.hardware.cpu.nice-Percent
CPU System %databricks.hardware.cpu.sys-Percent
CPU IOWait %databricks.hardware.cpu.iowait-Percent
CPU IRQ %databricks.hardware.cpu.irq-Percent
CPU Steal %databricks.hardware.cpu.steal-Percent
CPU Idle %databricks.hardware.cpu.idle-Percent
Memory Useddatabricks.hardware.mem.used-Byte
Memory Totaldatabricks.hardware.mem.total-KiloByte
Memory Freedatabricks.hardware.mem.free-KiloByte
Memory Buff/Cachedatabricks.hardware.mem.buff_cache-KiloByte
Metric nameMetric keyDescriptionUnit
Executor RDD Blocksdatabricks.spark.executor.rdd_blocks-Count
Executor Memory Useddatabricks.spark.executor.memory_used-Byte
Executor Disk Useddatabricks.spark.executor.disk_used-Byte
Executor Active Tasksdatabricks.spark.executor.active_tasks-Count
Executor Failed Tasksdatabricks.spark.executor.failed_tasks-Count
Executor Completed Tasksdatabricks.spark.executor.completed_tasks-Count
Executor Total Tasksdatabricks.spark.executor.total_tasks-Count
Executor Durationdatabricks.spark.executor.total_duration.count-MilliSecond
Executor Input Bytesdatabricks.spark.executor.total_input_bytes.count-Byte
Executor Shuffle Readdatabricks.spark.executor.total_shuffle_read.count-Byte
Executor Shuffle Writedatabricks.spark.executor.total_shuffle_write.count-Byte
Executor Max Memorydatabricks.spark.executor.max_memory-Byte
Executor Alive Countdatabricks.spark.executor.alive_count.gauge-Count
Executor Dead Countdatabricks.spark.executor.dead_count.gauge-Count
Metric nameMetric keyDescriptionUnit
Databricks Cluster Upsizing Timedatabricks.cluster.upsizing_timeTime spent upsizing clusterMilliSecond
Metric nameMetric keyDescriptionUnit
RDD Countdatabricks.spark.rdd_count.gauge-Count
RDD Partitionsdatabricks.spark.rdd.num_partitions-Count
RDD Cached Partitionsdatabricks.spark.rdd.num_cached_partitions-Count
RDD Memory Useddatabricks.spark.rdd.memory_used-Byte
RDD Disk Useddatabricks.spark.rdd.disk_used-Byte
Metric nameMetric keyDescriptionUnit
Streaming Batch Durationdatabricks.spark.streaming.statistics.batch_duration-MilliSecond
Streaming Receiversdatabricks.spark.streaming.statistics.num_receivers-Count
Streaming Active Receiversdatabricks.spark.streaming.statistics.num_active_receivers-Count
Streaming Inactive Receiversdatabricks.spark.streaming.statistics.num_inactive_receivers-Count
Streaming Completed Batchesdatabricks.spark.streaming.statistics.num_total_completed_batches.count-Count
Streaming Retained Completed Batchesdatabricks.spark.streaming.statistics.num_retained_completed_batches.count-Unspecified
Streaming Active Batchesdatabricks.spark.streaming.statistics.num_active_batches-Count
Streaming Processed Recordsdatabricks.spark.streaming.statistics.num_processed_records.count-Count
Streaming Received Recordsdatabricks.spark.streaming.statistics.num_received_records.count-Count
Streaming Avg Input Ratedatabricks.spark.streaming.statistics.avg_input_rate-Byte
Streaming Avg Scheduling Delaydatabricks.spark.streaming.statistics.avg_scheduling_delay-MilliSecond
Streaming Avg Processing Timedatabricks.spark.streaming.statistics.avg_processing_time-MilliSecond
Streaming Avg Total Delaydatabricks.spark.streaming.statistics.avg_total_delay-MilliSecond
Metric nameMetric keyDescriptionUnit
Application Countdatabricks.spark.application_count.gauge-Count

Related to Databricks

Databricks Workspace logo

Databricks Workspace

Remotely monitor your Databricks Workspaces!

Full version history

To have more information on how to install the downloaded package, please follow the instructions on this page.
ReleaseDate

Full version history

v1.5.6

  • DXS-3250
    • Update Library Versions

Full version history

v1.5.5

  • New Feature Set - Hardware Metrics

  • DXS-1597

    • Adds new configuration option - Aggregate Dimensions for Spark API Metrics
  • Updates to how Spark API is called

  • UA Screen updates

  • DXS-1920

    • Adds retry logic to determine driver node during start up of extension
  • Adds ability to ingest Spark Jobs as traces

    • NOTE : Depending on the number of Spark Jobs, this could be a significant amount of traces and could increase licensing costs.
  • Adds ability to ingest Spark Config as Log Messages

Full version history

##v1.02

  • Initial Release of Extensions 2.0 version of Databricks Extension
  • Offers Support for Ganglia APIs (Legacy), Apache Spark APIs, and Databricks APIs
Dynatrace Hub
Get data into DynatraceBuild your own app
All (771)Log Management and AnalyticsKubernetesAI and LLM ObservabilityInfrastructure ObservabilitySoftware DeliveryApplication ObservabilityApplication SecurityDigital ExperienceBusiness Analytics
Filter
Type
Built and maintained by
Deployment model
SaaS
  • SaaS
  • Managed
Partner FinderBecome a partnerDynatrace Developer

Discover recent additions to Dynatrace

Problems logo

Problems

Analyze abnormal system behavior and performance problems detected by Davis AI.

Logs logo

Logs

Explore all your logs without writing a single query.

Security Investigator logo

Security Investigator

Fast and precise forensics for security and logs on Grail data with DQL queries.

Business Flow logo

Business Flow

Track, analyze, and optimize your critical business processes.

Cost & Carbon Optimization logo

Cost & Carbon Optimization

Track, analyze, and optimize your IT carbon footprint and public cloud costs.

Davis Anomaly Detection logo

Davis Anomaly Detection

Detect anomalies in timeseries using the Davis AI

Analyze your data

Understand your data better with deep insights and clear visualizations.

Notebooks logo

Notebooks

Create powerful, data-driven documents for custom analytics and collaboration.

Dashboards logo

Dashboards

Transform complex data into clear visualizations with custom dashboards.

Automate your processes

Turn data and answers into actions, securely, and at scale.

Workflows logo

Workflows

Automate tasks in your IT landscape, remediate problems, and visualize processes

Jira logo

Jira

Create, query, comment, transition, and resolve Jira tickets within workflows.

Slack logo

Slack

Automate Slack messaging for security incidents, attacks, remediation, and more.

Secure your cloud application

See vulnerabilities and attacks in your environment.

Security Overview logo

Security Overview

Get a comprehensive overview of the security of your applications.

Code-Level Vulnerabilities logo

Code-Level Vulnerabilities

Detect vulnerabilities in your code in real time.

Threats & Exploits logo

Threats & Exploits

Understand, triage, and investigate application security findings and alerts.

Are you looking for something different?

We have hundreds of apps, extensions, and other technologies to customize your environment

Leverage our newest innovations of Dynatrace Saas

Kick-start your app creation

Kick-start your app creation

Whether you’re a beginner or a pro, Dynatrace Developer has the tools and support you need to create incredible apps with minimal effort.
Go to Dynatrace Developer
Upgrading from Dynatrace Managed to SaaS

Upgrading from Dynatrace Managed to SaaS

Drive innovation, speed, and agility in your organization by seamlessly and securely upgrading.
Learn More
Log Management and Analytics

Log Management and Analytics

Innovate faster and more efficiently with unified log management and log analytics for actionable insights and automation.
Learn more