Elasticsearch monitoring

Deprecation notice

This extension documentation is now deprecated and will no longer be updated. We recommend using the new Elasticsearch extension for improved functionality and support.

Dynatrace Elasticsearch monitoring provides a high-level overview of all Elasticsearch components within each monitored cluster in your environment.

Elasticsearch health metrics tell you everything you need to know about the health of your monitored clusters. When a problem occurs, it’s easy to see which nodes are affected. And it’s easy to drill down into the metrics of individual nodes to find the root cause of problems and potential bottlenecks.

Prerequisites

Elasticsearch monitoring in Dynatrace requires:

Elasticsearch 2.3+
Linux OS or Windows
OneAgent installed on all Elasticsearch nodes

Docker support

Dynatrace supports Elasticsearch running inside Docker container with OneAgent version 1.157+.

Image

You need to have an Elasticsearch Docker image, version 6.0.1+.

Container configuration

All instances must have the same port used for the REST API (default: 9200). You can change this via the http.port variable.
The REST port must be exposed to the host.

Example configuration: docker run -p 45709:1200 -e "discovery.type=single-node" -e "http.port=1200" docker.elastic.co/elasticsearch/elasticsearch:6.0.1

Enabling Elasticsearch monitoring globally

With Elasticsearch monitoring enabled globally, Dynatrace automatically collects Elasticsearch metrics whenever a new host running Elasticsearch is detected in your environment.

Go to Settings.
Select Monitoring > Monitored technologies.
Find the Elasticsearch entry and expand it for editing.
- User and Password are user credentials that have to work for all Elasticsearch hosts that you want to monitor. Leave them empty if no authentication is set up.
- URL is the Elasticsearch URL.
If you decide to instead configure Elasticsearch per host rather than globally, set the global Elasticsearch switch here to the Off position and click the host settings link to begin configuring Elasticsearch at the host level. See Enabling Elasticsearch monitoring for individual hosts below for details.
Select Save to save any changes.
Turn on the Elasticsearch switch to enable Elasticsearch monitoring globally.

Enabling Elasticsearch monitoring for individual hosts

Dynatrace also offers the option of enabling Elasticsearch monitoring for specific hosts rather than globally.

If Elasticsearch monitoring is currently switched on, switch it off: go to Settings > Monitoring > Monitored technologies and set the Elasticsearch switch to the Off position.
Go to Settings.
Select Monitoring > Monitoring overview.
Select the Hosts tab.
Find the host on which you want to enable Elasticsearch monitoring and select Edit.
The Monitored technologies section shows:
- A list of technologies that are currently being monitored globally. This list should not include Elasticsearch if you are configuring Elasticsearch monitoring for individual hosts.
- A table of technologies that you can enable on this host: Technology, Type, Monitoring, Edit.
Find Elasticsearch in the Technology column and click in the Edit column to display Elasticsearch host-level configuration settings.
- User and Password are user credentials for Elasticsearch on this host. Leave them empty if no authentication is set up.
- URL is the Elasticsearch URL.
Select Save to save any changes.
Turn on the Elasticsearch switch to enable Elasticsearch monitoring for the selected host.

Viewing Elasticsearch monitoring insights

Go to Technologies & Processes or Technologies & Processes Classic (latest Dynatrace).
In the Technology overview section, select the Elasticsearch tile.
Individual Elasticsearch clusters are represented as process groups. All detected Elasticsearch process groups are listed in the table at the bottom of the page.
To view metrics for a specific cluster, locate it in the table and select in the Details column to expand that row.
A chart shows the number of process group instances over the selected time range.
To see details, select the Process group details button.
On the Process group details page, in addition to system performance and networking metrics, you can select the Technology-specific metrics tab to display Elasticsearch cluster charts and metrics.
- Change the Show chart for selection to chart a different Elasticsearch cluster metric.
- All processes in the selected process group are listed at the bottom.
Select a process to display the details page for the selected process. In addition to general process status information, it has two Elasticsearch-specific tabs: Elasticsearch metrics and Further details.
Select the Elasticsearch metrics tab to display charts for Elasticsearch key metrics:
- Indexing (indexing total over the selected time range) shows the effectiveness of all indexing operations.
- Search (number of queries, fetches, and scrolls over the same time range) is an indicator of how efficient your search operations are. More operations in a shorter time interval indicates better performance.
Select the Further details tab to display the Process metrics page, which charts essential Elasticsearch metrics. You can filter these charts by cluster and node.
- Breakers
  Elasticsearch circuit breakers are thresholds used to prevent operations from causing OutOfMemoryError errors. Each breaker specifies a limit for how much memory it can use. If the estimated query size is larger than the limit, the circuit breaker is tripped, the query is aborted, and an exception is returned. This happens before data is loaded, which means that an OutOfMemoryException is avoided.
  - Limit size
  - Estimated size
  - Overhead
  - Tripped
- Indices
  Shows additional in-depth information about Elasticsearch indices. Of particular interest is the Translog chart, which shows whether Elasticsearch is keeping up with the data coming in by flushing it out to the indices on disk.
- Merge
  Can show the root cause of problems when a system is under too much load and merging can’t keep up.
- Search
  Shows additional in-depth information around Elasticsearch search operations, with performance charts for queries, fetches, and scrolls.
- Thread pools
  Shows details about how much load the system is currently processing. Enables you to see if you can increase the rate of queries or the amount of writes. Also enables you to see if there’s a bottleneck in one of the thread pools.

Supported metrics

These tables list all supported Elasticsearch metrics. A full description of all Elasticsearch statistics is available at www.elastic.co. Most Elasticsearch metrics are taken directly from Elasticsearch statistics and presented as is, with no additional computation.

Process group metrics

Process group metric	Description
status-green	Status green
status-yellow	Status yellow
status-red	Status red
status-unknown	Status unknown
number_of_nodes	Number of nodes
number_of_data_nodes	Number of data nodes
active_primary_shards	Active primary shards
active_shards	Active shards
relocating_shards	Relocating shards
initializing_shards	Initializing shards
unassigned_shards	Unassigned shards
delayed_unassigned_shards	Delayed unassigned shards
indices.count	Indices count
indices.shards.replication	Replica shards
indices.docs.count	Documents count
indices.docs.deleted	Deleted documents
indices.fielddata.memory_size_in_bytes	Field data size
indices.fielddata.evictions	Field data evictions
indices.query_cache.cache_size	Query cache size
indices.query_cache.cache_count	Query cache count
indices.query_cache.evictions	Query cache evictions
indices.segments.count	Segment count

Instance metrics

Instance metric	Description
node.indices.store.size_in_bytes	Store size
node.indices.store.throttle_time_in_millis	Store throttle time
node.indices.indexing.throttle_time_in_millis	Indexing throttle time
node.indices.indexing.index_time_in_millis	Indexing time
node.indices.indexing.index_total	Indexing total
node.indices.indexing.delete_total	Indexing delete
node.indices.indexing.index_failed	Indexing failed
node.indices.indexing.noop_update_total	Indexing noop update total
node.indices.search.query_time_in_millis	Query time
node.indices.search.query_total	Number of queries
node.indices.search.fetch_total	Number of fetches
node.indices.search.fetch_time_in_millis	Fetch time
node.indices.search.scroll_time_in_millis	Scroll time
node.indices.search.scroll_total	Number of scrolls
node.indices.search.local_total_time_in_millis	Total search time
node.indices.merges.total	Merge total
node.indices.merges.total_time_in_millis	Merge total time
node.indices.merges.total_docs	Merge total documents
node.indices.merges.total_size_in_bytes	Merge total size
node.indices.merges.total_stopped_time_in_millis	Merge stopped time
node.indices.merges.total_throttled_time_in_millis	Merge throttled time
node.indices.merges.total_auto_throttle_in_bytes	Merge auto throttle size
node.indices.refresh.total	Indicies refresh total
node.indices.refresh.total_time_in_millis	Indicies refresh time
node.indices.flush.total	Indices flush total
node.indices.flush.total_time_in_millis	Indices flush time
node.indices.warmer.total	Indices warmer total
node.indices.warmer.total_time_in_millis	Indices warmer time
node.indices.translog.operations	Indices translog operations
node.indices.translog.size_in_bytes	Indices translog size
node.indices.suggest.total	Indices suggest total
node.indices.suggest.time_in_millis	Indices suggest time
node.indices.request_cache.memory_size_in_bytes	Indices request cache size
node.indices.request_cache.evictions	Indices request cache evictions
node.indices.request_cache.hit_count	Indices request cache hit count
node.indices.request_cache.miss_count	Indices request cache miss count
node.indices.recovery.current_as_source	Indices recovery current as source
node.indices.recovery.current_as_target	Indices recovery current as target
node.indices.recovery.throttle_time_in_millis	Indices recovery throttle time
node.breakers.request.limit_size_in_bytes	Breakers request limit size
node.breakers.request.estimated_size_in_bytes	Breakers request estimated size
node.breakers.request.overhead	Breakers request overhead
node.breakers.request.tripped	Breakers request tripped
node.breakers.fielddata.limit_size_in_bytes	Breakers field data limit size
node.breakers.fielddata.estimated_size_in_bytes	Breakers field data estimated size
node.breakers.fielddata.overhead	Breakers field data overhead
node.breakers.fielddata.tripped	Breakers field data tripped
node.breakers.parent.limit_size_in_bytes	Breakers parent data limit size
node.breakers.parent.estimated_size_in_bytes	Breakers parent data estimated size
node.breakers.parent.overhead	Breakers parent data overhead
node.breakers.parent.tripped	Breakers parent data tripped
node.thread_pool.percolate.queue	Thread pools percolate queue
node.thread_pool.percolate.completed	Thread pools percolate completed
node.thread_pool.percolate.threads	Thread pools percolate threads
node.thread_pool.percolate.rejected	Thread pools percolate rejected
node.thread_pool.listener.queue	Thread pools listener queue
node.thread_pool.listener.completed	Thread pools listener completed
node.thread_pool.listener.threads	Thread pools listener threads
node.thread_pool.listener.rejected	Thread pools listener rejected
node.thread_pool.search.queue	Thread pools search queue
node.thread_pool.search.completed	Thread pools search completed
node.thread_pool.search.threads	Thread pools search threads
node.thread_pool.search.rejected	Thread pools search rejected
node.thread_pool.get.queue	Thread pools get queue
node.thread_pool.get.completed	Thread pools get completed
node.thread_pool.get.threads	Thread pools get threads
node.thread_pool.get.rejected	Thread pools get rejected
node.thread_pool.bulk.queue	Thread pools bulk queue
node.thread_pool.bulk.completed	Thread pools bulk completed
node.thread_pool.bulk.threads	Thread pools bulk threads
node.thread_pool.bulk.rejected	Thread pools bulk rejected
node.thread_pool.index.queue	Thread pools index queue
node.thread_pool.index.completed	Thread pools index completed
node.thread_pool.index.threads	Thread pools index threads
node.thread_pool.index.rejected	Thread pools index rejected
node.thread_pool.force_merge.queue	Thread pools force merge queue
node.thread_pool.force_merge.completed	Thread pools force merge completed
node.thread_pool.force_merge.threads	Thread pools force merge threads
node.thread_pool.force_merge.rejected	Thread pools force merge rejected
node.thread_pool.analyze.queue	Thread pools analyze queue
node.thread_pool.analyze.completed	Thread pools analyze completed
node.thread_pool.analyze.threads	Thread pools analyze threads
node.thread_pool.analyze.rejected	Thread pools analyze rejected
node.thread_pool.refresh.queue	Thread pools refresh queue
node.thread_pool.refresh.completed	Thread pools refresh completed
node.thread_pool.refresh.threads	Thread pools refresh threads
node.thread_pool.refresh.rejected	Thread pools refresh rejected
node.thread_pool.generic.queue	Thread pools generic queue
node.thread_pool.generic.completed	Thread pools generic completed
node.thread_pool.generic.threads	Thread pools generic threads
node.thread_pool.generic.rejected	Thread pools generic rejected
node.thread_pool.flush.queue	Thread pools flush queue
node.thread_pool.flush.completed	Thread pools flush completed
node.thread_pool.flush.threads	Thread pools flush threads
node.thread_pool.flush.rejected	Thread pools flush rejected
node.thread_pool.write.queue	Thread pools write queue
node.thread_pool.write.completed	Thread pools write completed
node.thread_pool.write.threads	Thread pools write threads
node.thread_pool.write.rejected	Thread pools write rejected
node.thread_pool.snapshot.queue	Thread pools snapshot queue
node.thread_pool.snapshot.completed	Thread pools snapshot completed
node.thread_pool.snapshot.threads	Thread pools snapshot threads
node.thread_pool.snapshot.rejected	Thread pools snapshot rejected
node.thread_pool.ccr.queue	Thread pools ccr queue
node.thread_pool.ccr.completed	Thread pools ccr completed
node.thread_pool.ccr.threads	Thread pools ccr threads
node.thread_pool.ccr.rejected	Thread pools ccr rejected