Elasticsearch

Dynatrace Elasticsearch monitoring provides a high-level overview of all Elasticsearch components within each monitored cluster in your environment.

Elasticsearch health metrics tell you everything you need to know about the health of your monitored clusters. When a problem occurs, it’s easy to see which nodes are affected. And it’s easy to drill down into the metrics of individual nodes to find the root cause of problems and potential bottlenecks.

Prerequisites

Elasticsearch monitoring in Dynatrace requires:

  • Elasticsearch 2.3+
  • Linux OS or Windows
  • OneAgent installed on all Elasticsearch nodes

Docker support

Dynatrace supports Elasticsearch running inside Docker container with OneAgent version 1.157+.

Image

You need to have an Elasticsearch Docker image, version 6.0.1+.

Container configuration

  • All instances must have the same port used for the REST API (default: 9200). You can change this via the http.port variable.
  • The REST port must be exposed to the host.

Example configuration: docker run -p 45709:1200 -e "discovery.type=single-node" -e "http.port=1200" docker.elastic.co/elasticsearch/elasticsearch:6.0.1

Enabling Elasticsearch monitoring globally

With Elasticsearch monitoring enabled globally, Dynatrace automatically collects Elasticsearch metrics whenever a new host running Elasticsearch is detected in your environment.

  1. In the navigation menu, select Settings.

  2. Select Monitoring > Monitored technologies.

  3. Find the Elasticsearch entry and expand it for editing.

    • User and Password are user credentials that have to work for all Elasticsearch hosts that you want to monitor. Leave them empty if no authentication is set up.
    • URL is the Elasticsearch URL.
    Note

    If you decide to instead configure Elasticsearch per host rather than globally, set the global Elasticsearch switch here to the Off position and click the host settings link to begin configuring Elasticsearch at the host level. See Enabling Elasticsearch monitoring for individual hosts below for details.

  4. Select Save to save any changes.

  5. Set the Elasticsearch switch to the On position to enable Elasticsearch monitoring globally.

Enabling Elasticsearch monitoring for individual hosts

Dynatrace also offers the option of enabling Elasticsearch monitoring for specific hosts rather than globally.

  1. If Elasticsearch monitoring is currently switched on, switch it off: go to Settings > Monitoring > Monitored technologies and set the Elasticsearch switch to the Off position.
  2. In the navigation menu, select Settings.
  3. Select Monitoring > Monitoring overview.
  4. Select the Hosts tab.
  5. Find the host on which you want to enable Elasticsearch monitoring and select Edit.
  6. The Monitored technologies section shows:
    • A list of technologies that are currently being monitored globally. This list should not include Elasticsearch if you are configuring Elasticsearch monitoring for individual hosts.
    • A table of technologies that you can enable on this host: Technology, Type, Monitoring, Edit.
  7. Find Elasticsearch in the Technology column and click in the Edit column to display Elasticsearch host-level configuration settings.
    • User and Password are user credentials for Elasticsearch on this host. Leave them empty if no authentication is set up.
    • URL is the Elasticsearch URL.
  8. Select Save to save any changes.
  9. Set the Elasticsearch switch to the On position to enable Elasticsearch monitoring for the selected host.

Viewing Elasticsearch monitoring insights

  1. In the navigation menu, select Technologies.
  2. In the Technology overview section, select the Elasticsearch tile.
    Individual Elasticsearch clusters are represented as process groups. All detected Elasticsearch process groups are listed in the table at the bottom of the page.
  3. To view metrics for a specific cluster, locate it in the table and click in the Details column to expand that row.
    A chart shows the number of process group instances over the selected time range.
  4. To see details, select the Process group details button.
  5. On the Process group details page, in addition to system performance and networking metrics, you can select the Technology-specific metrics tab to display Elasticsearch cluster charts and metrics.
    • Change the Show chart for selection to chart a different Elasticsearch cluster metric.
    • All processes in the selected process group are listed at the bottom.
  6. Select a process to display the details page for the selected process. In addition to general process status information, it has two Elasticsearch-specific tabs: Elasticsearch metrics and Further details.
  7. Select the Elasticsearch metrics tab to display charts for Elasticsearch key metrics:
    • Indexing (indexing total over the selected time range) shows the effectiveness of all indexing operations.
    • Search (number of queries, fetches, and scrolls over the same time range) is an indicator of how efficient your search operations are. More operations in a shorter time interval indicates better performance.
  8. Select the Further details tab to display the Process metrics page, which charts essential Elasticsearch metrics. You can filter these charts by cluster and node.
    • Breakers
      Elasticsearch circuit breakers are thresholds used to prevent operations from causing OutOfMemoryError errors. Each breaker specifies a limit for how much memory it can use. If the estimated query size is larger than the limit, the circuit breaker is tripped, the query is aborted, and an exception is returned. This happens before data is loaded, which means that an OutOfMemoryException is avoided.
      • Limit size
      • Estimated size
      • Overhead
      • Tripped
    • Indices
      Shows additional in-depth information about Elasticsearch indices. Of particular interest is the Translog chart, which shows whether Elasticsearch is keeping up with the data coming in by flushing it out to the indices on disk.
    • Merge
      Can show the root cause of problems when a system is under too much load and merging can’t keep up.
    • Search
      Shows additional in-depth information around Elasticsearch search operations, with performance charts for queries, fetches, and scrolls.
    • Thread pools
      Shows details about how much load the system is currently processing. Enables you to see if you can increase the rate of queries or the amount of writes. Also enables you to see if there’s a bottleneck in one of the thread pools.

Supported metrics

These tables list all supported Elasticsearch metrics. A full description of all Elasticsearch statistics is available at www.elastic.co. Most Elasticsearch metrics are taken directly from Elasticsearch statistics and presented as is, with no additional computation.

Process group metrics

Process group metric Description
status-green Status green
status-yellow Status yellow
status-red Status red
status-unknown Status unknown
number_of_nodes Number of nodes
number_of_data_nodes Number of data nodes
active_primary_shards Active primary shards
active_shards Active shards
relocating_shards Relocating shards
initializing_shards Initializing shards
unassigned_shards Unassigned shards
delayed_unassigned_shards Delayed unassigned shards
indices.count Indices count
indices.shards.replication Replica shards
indices.docs.count Documents count
indices.docs.deleted Deleted documents
indices.fielddata.memory_size_in_bytes Field data size
indices.fielddata.evictions Field data evictions
indices.query_cache.cache_size Query cache size
indices.query_cache.cache_count Query cache count
indices.query_cache.evictions Query cache evictions
indices.segments.count Segment count

Instance metrics

Instance metric Description
node.indices.store.size_in_bytes Store size
node.indices.store.throttle_time_in_millis Store throttle time
node.indices.indexing.throttle_time_in_millis Indexing throttle time
node.indices.indexing.index_time_in_millis Indexing time
node.indices.indexing.index_total Indexing total
node.indices.indexing.delete_total Indexing delete
node.indices.indexing.index_failed Indexing failed
node.indices.indexing.noop_update_total Indexing noop update total
node.indices.search.query_time_in_millis Query time
node.indices.search.query_total Number of queries
node.indices.search.fetch_total Number of fetches
node.indices.search.fetch_time_in_millis Fetch time
node.indices.search.scroll_time_in_millis Scroll time
node.indices.search.scroll_total Number of scrolls
node.indices.search.local_total_time_in_millis Total search time
node.indices.merges.total Merge total
node.indices.merges.total_time_in_millis Merge total time
node.indices.merges.total_docs Merge total documents
node.indices.merges.total_size_in_bytes Merge total size
node.indices.merges.total_stopped_time_in_millis Merge stopped time
node.indices.merges.total_throttled_time_in_millis Merge throttled time
node.indices.merges.total_auto_throttle_in_bytes Merge auto throttle size
node.indices.refresh.total Indicies refresh total
node.indices.refresh.total_time_in_millis Indicies refresh time
node.indices.flush.total Indices flush total
node.indices.flush.total_time_in_millis Indices flush time
node.indices.warmer.total Indices warmer total
node.indices.warmer.total_time_in_millis Indices warmer time
node.indices.translog.operations Indices translog operations
node.indices.translog.size_in_bytes Indices translog size
node.indices.suggest.total Indices suggest total
node.indices.suggest.time_in_millis Indices suggest time
node.indices.request_cache.memory_size_in_bytes Indices request cache size
node.indices.request_cache.evictions Indices request cache evictions
node.indices.request_cache.hit_count Indices request cache hit count
node.indices.request_cache.miss_count Indices request cache miss count
node.indices.recovery.current_as_source Indices recovery current as source
node.indices.recovery.current_as_target Indices recovery current as target
node.indices.recovery.throttle_time_in_millis Indices recovery throttle time
node.breakers.request.limit_size_in_bytes Breakers request limit size
node.breakers.request.estimated_size_in_bytes Breakers request estimated size
node.breakers.request.overhead Breakers request overhead
node.breakers.request.tripped Breakers request tripped
node.breakers.fielddata.limit_size_in_bytes Breakers field data limit size
node.breakers.fielddata.estimated_size_in_bytes Breakers field data estimated size
node.breakers.fielddata.overhead Breakers field data overhead
node.breakers.fielddata.tripped Breakers field data tripped
node.breakers.parent.limit_size_in_bytes Breakers parent data limit size
node.breakers.parent.estimated_size_in_bytes Breakers parent data estimated size
node.breakers.parent.overhead Breakers parent data overhead
node.breakers.parent.tripped Breakers parent data tripped
node.thread_pool.percolate.queue Thread pools percolate queue
node.thread_pool.percolate.completed Thread pools percolate completed
node.thread_pool.percolate.threads Thread pools percolate threads
node.thread_pool.percolate.rejected Thread pools percolate rejected
node.thread_pool.listener.queue Thread pools listener queue
node.thread_pool.listener.completed Thread pools listener completed
node.thread_pool.listener.threads Thread pools listener threads
node.thread_pool.listener.rejected Thread pools listener rejected
node.thread_pool.search.queue Thread pools search queue
node.thread_pool.search.completed Thread pools search completed
node.thread_pool.search.threads Thread pools search threads
node.thread_pool.search.rejected Thread pools search rejected
node.thread_pool.get.queue Thread pools get queue
node.thread_pool.get.completed Thread pools get completed
node.thread_pool.get.threads Thread pools get threads
node.thread_pool.get.rejected Thread pools get rejected
node.thread_pool.bulk.queue Thread pools bulk queue
node.thread_pool.bulk.completed Thread pools bulk completed
node.thread_pool.bulk.threads Thread pools bulk threads
node.thread_pool.bulk.rejected Thread pools bulk rejected
node.thread_pool.index.queue Thread pools index queue
node.thread_pool.index.completed Thread pools index completed
node.thread_pool.index.threads Thread pools index threads
node.thread_pool.index.rejected Thread pools index rejected
node.thread_pool.force_merge.queue Thread pools force merge queue
node.thread_pool.force_merge.completed Thread pools force merge completed
node.thread_pool.force_merge.threads Thread pools force merge threads
node.thread_pool.force_merge.rejected Thread pools force merge rejected
node.thread_pool.analyze.queue Thread pools analyze queue
node.thread_pool.analyze.completed Thread pools analyze completed
node.thread_pool.analyze.threads Thread pools analyze threads
node.thread_pool.analyze.rejected Thread pools analyze rejected
node.thread_pool.refresh.queue Thread pools refresh queue
node.thread_pool.refresh.completed Thread pools refresh completed
node.thread_pool.refresh.threads Thread pools refresh threads
node.thread_pool.refresh.rejected Thread pools refresh rejected
node.thread_pool.generic.queue Thread pools generic queue
node.thread_pool.generic.completed Thread pools generic completed
node.thread_pool.generic.threads Thread pools generic threads
node.thread_pool.generic.rejected Thread pools generic rejected
node.thread_pool.flush.queue Thread pools flush queue
node.thread_pool.flush.completed Thread pools flush completed
node.thread_pool.flush.threads Thread pools flush threads
node.thread_pool.flush.rejected Thread pools flush rejected
node.thread_pool.write.queue Thread pools write queue
node.thread_pool.write.completed Thread pools write completed
node.thread_pool.write.threads Thread pools write threads
node.thread_pool.write.rejected Thread pools write rejected
node.thread_pool.snapshot.queue Thread pools snapshot queue
node.thread_pool.snapshot.completed Thread pools snapshot completed
node.thread_pool.snapshot.threads Thread pools snapshot threads
node.thread_pool.snapshot.rejected Thread pools snapshot rejected
node.thread_pool.ccr.queue Thread pools ccr queue
node.thread_pool.ccr.completed Thread pools ccr completed
node.thread_pool.ccr.threads Thread pools ccr threads
node.thread_pool.ccr.rejected Thread pools ccr rejected