Elasticsearch monitoring

Dynatrace Elasticsearch monitoring provides a high-level overview of all Elasticsearch components within each monitored cluster in your environment.

Elasticsearch health metrics tell you everything you need to know about the health of your monitored clusters. When a problem occurs, it’s easy to see which nodes are affected. And it’s easy to drill down into the metrics of individual nodes to find the root cause of problems and potential bottlenecks.

Prerequisites

Elasticsearch monitoring in Dynatrace requires:

  • Elasticsearch 2.3+
  • Linux OS or Windows
  • OneAgent installed on all Elasticsearch nodes

Docker support

Dynatrace supports Elasticsearch running inside Docker container with OneAgent version 1.157+.

Image

You need to have an Elasticsearch Docker image, version 6.0.1+.

Container configuration

  • All instances must have the same port used for the REST API (default: 9200). You can change this via the http.port variable.
  • The REST port must be exposed to the host.

Example configuration: docker run -p 45709:1200 -e "discovery.type=single-node" -e "http.port=1200" docker.elastic.co/elasticsearch/elasticsearch:6.0.1

Enabling Elasticsearch monitoring globally

With Elasticsearch monitoring enabled globally, Dynatrace automatically collects Elasticsearch metrics whenever a new host running Elasticsearch is detected in your environment.

  1. In the navigation menu, select Settings.

  2. Select Monitoring > Monitored technologies.

  3. Find the Elasticsearch entry and expand it for editing.

    • User and Password are user credentials that have to work for all Elasticsearch hosts that you want to monitor. Leave them empty if no authentication is set up.
    • URL is the Elasticsearch URL.
    Note

    If you decide to instead configure Elasticsearch per host rather than globally, set the global Elasticsearch switch here to the Off position and click the host settings link to begin configuring Elasticsearch at the host level. See Enabling Elasticsearch monitoring for individual hosts below for details.

  4. Select Save to save any changes.

  5. Set the Elasticsearch switch to the On position to enable Elasticsearch monitoring globally.

Enabling Elasticsearch monitoring for individual hosts

Dynatrace also offers the option of enabling Elasticsearch monitoring for specific hosts rather than globally.

  1. If Elasticsearch monitoring is currently switched on, switch it off: go to Settings > Monitoring > Monitored technologies and set the Elasticsearch switch to the Off position.
  2. In the navigation menu, select Settings.
  3. Select Monitoring > Monitoring overview.
  4. Select the Hosts tab.
  5. Find the host on which you want to enable Elasticsearch monitoring and select Edit.
  6. The Monitored technologies section shows:
    • A list of technologies that are currently being monitored globally. This list should not include Elasticsearch if you are configuring Elasticsearch monitoring for individual hosts.
    • A table of technologies that you can enable on this host: Technology, Type, Monitoring, Edit.
  7. Find Elasticsearch in the Technology column and click in the Edit column to display Elasticsearch host-level configuration settings.
    • User and Password are user credentials for Elasticsearch on this host. Leave them empty if no authentication is set up.
    • URL is the Elasticsearch URL.
  8. Select Save to save any changes.
  9. Set the Elasticsearch switch to the On position to enable Elasticsearch monitoring for the selected host.

Viewing Elasticsearch monitoring insights

  1. In the navigation menu, select Technologies.
  2. In the Technology overview section, select the Elasticsearch tile.
    Individual Elasticsearch clusters are represented as process groups. All detected Elasticsearch process groups are listed in the table at the bottom of the page.
  3. To view metrics for a specific cluster, locate it in the table and click in the Details column to expand that row.
    A chart shows the number of process group instances over the selected time range.
  4. To see details, select the Process group details button.
  5. On the Process group details page, in addition to system performance and networking metrics, you can select the Technology-specific metrics tab to display Elasticsearch cluster charts and metrics.
    • Change the Show chart for selection to chart a different Elasticsearch cluster metric.
    • All processes in the selected process group are listed at the bottom.
  6. Select a process to display the details page for the selected process. In addition to general process status information, it has two Elasticsearch-specific tabs: Elasticsearch metrics and Further details.
  7. Select the Elasticsearch metrics tab to display charts for Elasticsearch key metrics:
    • Indexing (indexing total over the selected time range) shows the effectiveness of all indexing operations.
    • Search (number of queries, fetches, and scrolls over the same time range) is an indicator of how efficient your search operations are. More operations in a shorter time interval indicates better performance.
  8. Select the Further details tab to display the Process metrics page, which charts essential Elasticsearch metrics. You can filter these charts by cluster and node.
    • Breakers
      Elasticsearch circuit breakers are thresholds used to prevent operations from causing OutOfMemoryError errors. Each breaker specifies a limit for how much memory it can use. If the estimated query size is larger than the limit, the circuit breaker is tripped, the query is aborted, and an exception is returned. This happens before data is loaded, which means that an OutOfMemoryException is avoided.
      • Limit size
      • Estimated size
      • Overhead
      • Tripped
    • Indices
      Shows additional in-depth information about Elasticsearch indices. Of particular interest is the Translog chart, which shows whether Elasticsearch is keeping up with the data coming in by flushing it out to the indices on disk.
    • Merge
      Can show the root cause of problems when a system is under too much load and merging can’t keep up.
    • Search
      Shows additional in-depth information around Elasticsearch search operations, with performance charts for queries, fetches, and scrolls.
    • Thread pools
      Shows details about how much load the system is currently processing. Enables you to see if you can increase the rate of queries or the amount of writes. Also enables you to see if there’s a bottleneck in one of the thread pools.

Supported metrics

These tables list all supported Elasticsearch metrics. A full description of all Elasticsearch statistics is available at www.elastic.co. Most Elasticsearch metrics are taken directly from Elasticsearch statistics and presented as is, with no additional computation.

Process group metrics

Process group metric Description

status-green

Status green

status-yellow

Status yellow

status-red

Status red

status-unknown

Status unknown

number_of_nodes

Number of nodes

number_of_data_nodes

Number of data nodes

active_primary_shards

Active primary shards

active_shards

Active shards

relocating_shards

Relocating shards

initializing_shards

Initializing shards

unassigned_shards

Unassigned shards

delayed_unassigned_shards

Delayed unassigned shards

indices.count

Indices count

indices.shards.replication

Replica shards

indices.docs.count

Documents count

indices.docs.deleted

Deleted documents

indices.fielddata.memory_size_in_bytes

Field data size

indices.fielddata.evictions

Field data evictions

indices.query_cache.cache_size

Query cache size

indices.query_cache.cache_count

Query cache count

indices.query_cache.evictions

Query cache evictions

indices.segments.count

Segment count

Instance metrics

Instance metric Description

node.indices.store.size_in_bytes

Store size

node.indices.store.throttle_time_in_millis

Store throttle time

node.indices.indexing.throttle_time_in_millis

Indexing throttle time

node.indices.indexing.index_time_in_millis

Indexing time

node.indices.indexing.index_total

Indexing total

node.indices.indexing.delete_total

Indexing delete

node.indices.indexing.index_failed

Indexing failed

node.indices.indexing.noop_update_total

Indexing noop update total

node.indices.search.query_time_in_millis

Query time

node.indices.search.query_total

Number of queries

node.indices.search.fetch_total

Number of fetches

node.indices.search.fetch_time_in_millis

Fetch time

node.indices.search.scroll_time_in_millis

Scroll time

node.indices.search.scroll_total

Number of scrolls

node.indices.search.local_total_time_in_millis

Total search time

node.indices.merges.total

Merge total

node.indices.merges.total_time_in_millis

Merge total time

node.indices.merges.total_docs

Merge total documents

node.indices.merges.total_size_in_bytes

Merge total size

node.indices.merges.total_stopped_time_in_millis

Merge stopped time

node.indices.merges.total_throttled_time_in_millis

Merge throttled time

node.indices.merges.total_auto_throttle_in_bytes

Merge auto throttle size

node.indices.refresh.total

Indicies refresh total

node.indices.refresh.total_time_in_millis

Indicies refresh time

node.indices.flush.total

Indices flush total

node.indices.flush.total_time_in_millis

Indices flush time

node.indices.warmer.total

Indices warmer total

node.indices.warmer.total_time_in_millis

Indices warmer time

node.indices.translog.operations

Indices translog operations

node.indices.translog.size_in_bytes

Indices translog size

node.indices.suggest.total

Indices suggest total

node.indices.suggest.time_in_millis

Indices suggest time

node.indices.request_cache.memory_size_in_bytes

Indices request cache size

node.indices.request_cache.evictions

Indices request cache evictions

node.indices.request_cache.hit_count

Indices request cache hit count

node.indices.request_cache.miss_count

Indices request cache miss count

node.indices.recovery.current_as_source

Indices recovery current as source

node.indices.recovery.current_as_target

Indices recovery current as target

node.indices.recovery.throttle_time_in_millis

Indices recovery throttle time

node.breakers.request.limit_size_in_bytes

Breakers request limit size

node.breakers.request.estimated_size_in_bytes

Breakers request estimated size

node.breakers.request.overhead

Breakers request overhead

node.breakers.request.tripped

Breakers request tripped

node.breakers.fielddata.limit_size_in_bytes

Breakers field data limit size

node.breakers.fielddata.estimated_size_in_bytes

Breakers field data estimated size

node.breakers.fielddata.overhead

Breakers field data overhead

node.breakers.fielddata.tripped

Breakers field data tripped

node.breakers.parent.limit_size_in_bytes

Breakers parent data limit size

node.breakers.parent.estimated_size_in_bytes

Breakers parent data estimated size

node.breakers.parent.overhead

Breakers parent data overhead

node.breakers.parent.tripped

Breakers parent data tripped

node.thread_pool.percolate.queue

Thread pools percolate queue

node.thread_pool.percolate.completed

Thread pools percolate completed

node.thread_pool.percolate.threads

Thread pools percolate threads

node.thread_pool.percolate.rejected

Thread pools percolate rejected

node.thread_pool.listener.queue

Thread pools listener queue

node.thread_pool.listener.completed

Thread pools listener completed

node.thread_pool.listener.threads

Thread pools listener threads

node.thread_pool.listener.rejected

Thread pools listener rejected

node.thread_pool.search.queue

Thread pools search queue

node.thread_pool.search.completed

Thread pools search completed

node.thread_pool.search.threads

Thread pools search threads

node.thread_pool.search.rejected

Thread pools search rejected

node.thread_pool.get.queue

Thread pools get queue

node.thread_pool.get.completed

Thread pools get completed

node.thread_pool.get.threads

Thread pools get threads

node.thread_pool.get.rejected

Thread pools get rejected

node.thread_pool.bulk.queue

Thread pools bulk queue

node.thread_pool.bulk.completed

Thread pools bulk completed

node.thread_pool.bulk.threads

Thread pools bulk threads

node.thread_pool.bulk.rejected

Thread pools bulk rejected

node.thread_pool.index.queue

Thread pools index queue

node.thread_pool.index.completed

Thread pools index completed

node.thread_pool.index.threads

Thread pools index threads

node.thread_pool.index.rejected

Thread pools index rejected

node.thread_pool.force_merge.queue

Thread pools force merge queue

node.thread_pool.force_merge.completed

Thread pools force merge completed

node.thread_pool.force_merge.threads

Thread pools force merge threads

node.thread_pool.force_merge.rejected

Thread pools force merge rejected

node.thread_pool.analyze.queue

Thread pools analyze queue

node.thread_pool.analyze.completed

Thread pools analyze completed

node.thread_pool.analyze.threads

Thread pools analyze threads

node.thread_pool.analyze.rejected

Thread pools analyze rejected

node.thread_pool.refresh.queue

Thread pools refresh queue

node.thread_pool.refresh.completed

Thread pools refresh completed

node.thread_pool.refresh.threads

Thread pools refresh threads

node.thread_pool.refresh.rejected

Thread pools refresh rejected

node.thread_pool.generic.queue

Thread pools generic queue

node.thread_pool.generic.completed

Thread pools generic completed

node.thread_pool.generic.threads

Thread pools generic threads

node.thread_pool.generic.rejected

Thread pools generic rejected

node.thread_pool.flush.queue

Thread pools flush queue

node.thread_pool.flush.completed

Thread pools flush completed

node.thread_pool.flush.threads

Thread pools flush threads

node.thread_pool.flush.rejected

Thread pools flush rejected

node.thread_pool.write.queue

Thread pools write queue

node.thread_pool.write.completed

Thread pools write completed

node.thread_pool.write.threads

Thread pools write threads

node.thread_pool.write.rejected

Thread pools write rejected

node.thread_pool.snapshot.queue

Thread pools snapshot queue

node.thread_pool.snapshot.completed

Thread pools snapshot completed

node.thread_pool.snapshot.threads

Thread pools snapshot threads

node.thread_pool.snapshot.rejected

Thread pools snapshot rejected

node.thread_pool.ccr.queue

Thread pools ccr queue

node.thread_pool.ccr.completed

Thread pools ccr completed

node.thread_pool.ccr.threads

Thread pools ccr threads

node.thread_pool.ccr.rejected

Thread pools ccr rejected