What is Apache Spark?

Question

Accepted Answer

Apache Spark is an open-source cluster-computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance. As a fast, in-memory data processing engine Apache Spark allows data workers to efficiently execute various tasks. Examples are streaming, machine learning or SQL workloads that require fast iterative access to datasets.
Dynatrace monitors and analyzes the activity of your Apache Spark processes, providing Spark-specific metrics alongside all infrastructure measurements. With Apache Spark monitoring enabled globally, Dynatrace automatically collects Spark metrics whenever a new host running Spark is detected in your environment. Therefore Apache Spark monitoring provides insight into the resource usage, job status, and performance of Spark Standalone clusters.

Apache Spark performance

In-depth cluster performance analysis

Details about all of your Apache Spark components

Pinpoint problems at the code level