Spark

Apache Spark monitoring in Dynatrace provides insight into the resource usage, job status, and performance of Spark Standalone clusters.

Monitoring is available for the three main Spark components:

  • Cluster manager
  • Driver program
  • Worker nodes

Apache Spark metrics are presented alongside other infrastructure measurements, enabling in-depth cluster performance analysis of both current and historical data.

Prerequisites

  • Dynatrace OneAgent version 1.105+
  • Linux OS or Windows
  • Spark Standalone cluster manager
  • Spark version 1.6
  • Enabled JMX monitoring metrics. Turn on JmxSink for all instances by class names: org.apache.spark.metrics.sink.JmxSink conf/metrics.properties. For details, see Spark documentation.
  • To recognize your cluster, SparkSubmit must be executed with the –master parameter and master host address. For example:
    spark-submit \  
    --class de.codecentric.SparkPi \  
    --master spark://192.168.33.100:7077  \  
    --conf spark.eventLog.enabled=true \  
    /vagrant/jars/spark-pi-example-1.0.jar 100  
    

Enabling Spark monitoring globally

  1. In the navigation menu, select Settings.
  2. Select Monitoring > Monitored technologies.
  3. On the Supported technologies tab, find the Spark row.
  4. Set the Spark switch to the On position.

With Spark monitoring enabled globally, Dynatrace automatically collects Spark metrics whenever a new host running Spark is detected in your environment.

Viewing Spark monitoring insights

  1. In the navigation menu, select Technologies.
  2. Click the Spark tile on the Technology overview page.
  3. To view cluster metrics, expand the Details section of the Spark process group.
  4. Click the Process group details button.
  5. On the Process group details page, click the Technology-specific metrics tab. Here you’ll find metrics for all Spark components.
  6. Further down the page, you’ll find a number of other cluster-specific charts.

The cluster charts section provides all the information you need regarding jobs, stages, messages, workers, and message processing. When jobs fail, the cause is typically a lack of cores or RAM. Note that for both cores and RAM, the maximum value is not your system’s maximum, it’s the maximum value as defined by your Spark configuration. Using the workers chart, you can immediately see when one of your nodes goes down.

Spark node/worker monitoring

  1. To access valuable Spark node metrics, select a worker from the Process list on the Process group details page (see example below).
  2. Click the Apache Spark worker tab.

Spark cluster metrics

Metric Description
All jobs Number of jobs.
Active jobs Number of active jobs.
Waiting jobs Number of waiting jobs.
Failed jobs Number of failed jobs.
Running jobs Number of running jobs.
Count Number of messages in the scheduler’s event-processing loop.
Calls per sec Calls per second.
Number of workers Number of workers.
Alive workers Number of alive workers.
Number of apps Number of running applications.
Waiting apps Number of waiting applications.

Spark worker metrics

Metric Description
Number of free cores Number of worker-free cores.
Number of cores used Number of worker cores used.
Number of executors Number of worker executors.
Used MB Amount of worker memory used (megabytes).
Free MB Amount of worker-free memory in (megabytes).