Elasticsearch monitoring
Dynatrace Elasticsearch monitoring provides a high-level overview of all Elasticsearch components within each monitored cluster in your environment.
Elasticsearch health metrics tell you everything you need to know about the health of your monitored clusters. When a problem occurs, it’s easy to see which nodes are affected. And it’s easy to drill down into the metrics of individual nodes to find the root cause of problems and potential bottlenecks.
Prerequisites
Elasticsearch monitoring in Dynatrace requires:
- Elasticsearch 2.3+
- Linux OS or Windows
- OneAgent installed on all Elasticsearch nodes
Docker support
Dynatrace supports Elasticsearch running inside Docker container with OneAgent version 1.157+.
Image
You need to have an Elasticsearch Docker image, version 6.0.1+.
Container configuration
- All instances must have the same port used for the REST API (default:
9200
). You can change this via thehttp.port
variable. - The REST port must be exposed to the host.
Example configuration:
docker run -p 45709:1200 -e "discovery.type=single-node" -e "http.port=1200" docker.elastic.co/elasticsearch/elasticsearch:6.0.1
Enabling Elasticsearch monitoring globally
With Elasticsearch monitoring enabled globally, Dynatrace automatically collects Elasticsearch metrics whenever a new host running Elasticsearch is detected in your environment.
- In the Dynatrace menu, go to Settings.
- Select Monitoring > Monitored technologies.
- Find the Elasticsearch entry and expand it for editing.
- User and Password are user credentials that have to work for all Elasticsearch hosts that you want to monitor. Leave them empty if no authentication is set up.
- URL is the Elasticsearch URL.
If you decide to instead configure Elasticsearch per host rather than globally, set the global Elasticsearch switch here to the Off position and click the host settings link to begin configuring Elasticsearch at the host level. See Enabling Elasticsearch monitoring for individual hosts below for details.
- Select Save to save any changes.
- Turn on the Elasticsearch switch to enable Elasticsearch monitoring globally.
Enabling Elasticsearch monitoring for individual hosts
Dynatrace also offers the option of enabling Elasticsearch monitoring for specific hosts rather than globally.
- If Elasticsearch monitoring is currently switched on, switch it off: go to Settings > Monitoring > Monitored technologies and set the Elasticsearch switch to the Off position.
- In the Dynatrace menu, go to Settings.
- Select Monitoring > Monitoring overview.
- Select the Hosts tab.
- Find the host on which you want to enable Elasticsearch monitoring and select Edit.
- The Monitored technologies section shows:
- A list of technologies that are currently being monitored globally. This list should not include Elasticsearch if you are configuring Elasticsearch monitoring for individual hosts.
- A table of technologies that you can enable on this host: Technology, Type, Monitoring, Edit.
- Find Elasticsearch in the Technology column and click in the Edit column to display Elasticsearch host-level configuration settings.
- User and Password are user credentials for Elasticsearch on this host. Leave them empty if no authentication is set up.
- URL is the Elasticsearch URL.
- Select Save to save any changes.
- Turn on the Elasticsearch switch to enable Elasticsearch monitoring for the selected host.
Viewing Elasticsearch monitoring insights
- In the Dynatrace menu, go to Technologies.
- In the Technology overview section, select the Elasticsearch tile.
Individual Elasticsearch clusters are represented as process groups. All detected Elasticsearch process groups are listed in the table at the bottom of the page. - To view metrics for a specific cluster, locate it in the table and select in the Details column to expand that row.
A chart shows the number of process group instances over the selected time range. - To see details, select the Process group details button.
- On the Process group details page, in addition to system performance and networking metrics, you can select the Technology-specific metrics tab to display Elasticsearch cluster charts and metrics.
- Change the Show chart for selection to chart a different Elasticsearch cluster metric.
- All processes in the selected process group are listed at the bottom.
- Select a process to display the details page for the selected process. In addition to general process status information, it has two Elasticsearch-specific tabs: Elasticsearch metrics and Further details.
- Select the Elasticsearch metrics tab to display charts for Elasticsearch key metrics:
- Indexing (indexing total over the selected time range) shows the effectiveness of all indexing operations.
- Search (number of queries, fetches, and scrolls over the same time range) is an indicator of how efficient your search operations are. More operations in a shorter time interval indicates better performance.
- Select the Further details tab to display the Process metrics page, which charts essential Elasticsearch metrics. You can filter these charts by cluster and node.
- Breakers
Elasticsearch circuit breakers are thresholds used to prevent operations from causing OutOfMemoryError errors. Each breaker specifies a limit for how much memory it can use. If the estimated query size is larger than the limit, the circuit breaker is tripped, the query is aborted, and an exception is returned. This happens before data is loaded, which means that anOutOfMemoryException
is avoided.- Limit size
- Estimated size
- Overhead
- Tripped
- Indices
Shows additional in-depth information about Elasticsearch indices. Of particular interest is the Translog chart, which shows whether Elasticsearch is keeping up with the data coming in by flushing it out to the indices on disk. - Merge
Can show the root cause of problems when a system is under too much load and merging can’t keep up. - Search
Shows additional in-depth information around Elasticsearch search operations, with performance charts for queries, fetches, and scrolls. - Thread pools
Shows details about how much load the system is currently processing. Enables you to see if you can increase the rate of queries or the amount of writes. Also enables you to see if there’s a bottleneck in one of the thread pools.
- Breakers
Supported metrics
These tables list all supported Elasticsearch metrics. A full description of all Elasticsearch statistics is available at www.elastic.co. Most Elasticsearch metrics are taken directly from Elasticsearch statistics and presented as is, with no additional computation.
Process group metrics
Process group metric | Description |
---|---|
status-green | Status green |
status-yellow | Status yellow |
status-red | Status red |
status-unknown | Status unknown |
number_of_nodes | Number of nodes |
number_of_data_nodes | Number of data nodes |
active_primary_shards | Active primary shards |
active_shards | Active shards |
relocating_shards | Relocating shards |
initializing_shards | Initializing shards |
unassigned_shards | Unassigned shards |
delayed_unassigned_shards | Delayed unassigned shards |
indices.count | Indices count |
indices.shards.replication | Replica shards |
indices.docs.count | Documents count |
indices.docs.deleted | Deleted documents |
indices.fielddata.memory_size_in_bytes | Field data size |
indices.fielddata.evictions | Field data evictions |
indices.query_cache.cache_size | Query cache size |
indices.query_cache.cache_count | Query cache count |
indices.query_cache.evictions | Query cache evictions |
indices.segments.count | Segment count |
Instance metrics
Instance metric | Description |
---|---|
node.indices.store.size_in_bytes | Store size |
node.indices.store.throttle_time_in_millis | Store throttle time |
node.indices.indexing.throttle_time_in_millis | Indexing throttle time |
node.indices.indexing.index_time_in_millis | Indexing time |
node.indices.indexing.index_total | Indexing total |
node.indices.indexing.delete_total | Indexing delete |
node.indices.indexing.index_failed | Indexing failed |
node.indices.indexing.noop_update_total | Indexing noop update total |
node.indices.search.query_time_in_millis | Query time |
node.indices.search.query_total | Number of queries |
node.indices.search.fetch_total | Number of fetches |
node.indices.search.fetch_time_in_millis | Fetch time |
node.indices.search.scroll_time_in_millis | Scroll time |
node.indices.search.scroll_total | Number of scrolls |
node.indices.search.local_total_time_in_millis | Total search time |
node.indices.merges.total | Merge total |
node.indices.merges.total_time_in_millis | Merge total time |
node.indices.merges.total_docs | Merge total documents |
node.indices.merges.total_size_in_bytes | Merge total size |
node.indices.merges.total_stopped_time_in_millis | Merge stopped time |
node.indices.merges.total_throttled_time_in_millis | Merge throttled time |
node.indices.merges.total_auto_throttle_in_bytes | Merge auto throttle size |
node.indices.refresh.total | Indicies refresh total |
node.indices.refresh.total_time_in_millis | Indicies refresh time |
node.indices.flush.total | Indices flush total |
node.indices.flush.total_time_in_millis | Indices flush time |
node.indices.warmer.total | Indices warmer total |
node.indices.warmer.total_time_in_millis | Indices warmer time |
node.indices.translog.operations | Indices translog operations |
node.indices.translog.size_in_bytes | Indices translog size |
node.indices.suggest.total | Indices suggest total |
node.indices.suggest.time_in_millis | Indices suggest time |
node.indices.request_cache.memory_size_in_bytes | Indices request cache size |
node.indices.request_cache.evictions | Indices request cache evictions |
node.indices.request_cache.hit_count | Indices request cache hit count |
node.indices.request_cache.miss_count | Indices request cache miss count |
node.indices.recovery.current_as_source | Indices recovery current as source |
node.indices.recovery.current_as_target | Indices recovery current as target |
node.indices.recovery.throttle_time_in_millis | Indices recovery throttle time |
node.breakers.request.limit_size_in_bytes | Breakers request limit size |
node.breakers.request.estimated_size_in_bytes | Breakers request estimated size |
node.breakers.request.overhead | Breakers request overhead |
node.breakers.request.tripped | Breakers request tripped |
node.breakers.fielddata.limit_size_in_bytes | Breakers field data limit size |
node.breakers.fielddata.estimated_size_in_bytes | Breakers field data estimated size |
node.breakers.fielddata.overhead | Breakers field data overhead |
node.breakers.fielddata.tripped | Breakers field data tripped |
node.breakers.parent.limit_size_in_bytes | Breakers parent data limit size |
node.breakers.parent.estimated_size_in_bytes | Breakers parent data estimated size |
node.breakers.parent.overhead | Breakers parent data overhead |
node.breakers.parent.tripped | Breakers parent data tripped |
node.thread_pool.percolate.queue | Thread pools percolate queue |
node.thread_pool.percolate.completed | Thread pools percolate completed |
node.thread_pool.percolate.threads | Thread pools percolate threads |
node.thread_pool.percolate.rejected | Thread pools percolate rejected |
node.thread_pool.listener.queue | Thread pools listener queue |
node.thread_pool.listener.completed | Thread pools listener completed |
node.thread_pool.listener.threads | Thread pools listener threads |
node.thread_pool.listener.rejected | Thread pools listener rejected |
node.thread_pool.search.queue | Thread pools search queue |
node.thread_pool.search.completed | Thread pools search completed |
node.thread_pool.search.threads | Thread pools search threads |
node.thread_pool.search.rejected | Thread pools search rejected |
node.thread_pool.get.queue | Thread pools get queue |
node.thread_pool.get.completed | Thread pools get completed |
node.thread_pool.get.threads | Thread pools get threads |
node.thread_pool.get.rejected | Thread pools get rejected |
node.thread_pool.bulk.queue | Thread pools bulk queue |
node.thread_pool.bulk.completed | Thread pools bulk completed |
node.thread_pool.bulk.threads | Thread pools bulk threads |
node.thread_pool.bulk.rejected | Thread pools bulk rejected |
node.thread_pool.index.queue | Thread pools index queue |
node.thread_pool.index.completed | Thread pools index completed |
node.thread_pool.index.threads | Thread pools index threads |
node.thread_pool.index.rejected | Thread pools index rejected |
node.thread_pool.force_merge.queue | Thread pools force merge queue |
node.thread_pool.force_merge.completed | Thread pools force merge completed |
node.thread_pool.force_merge.threads | Thread pools force merge threads |
node.thread_pool.force_merge.rejected | Thread pools force merge rejected |
node.thread_pool.analyze.queue | Thread pools analyze queue |
node.thread_pool.analyze.completed | Thread pools analyze completed |
node.thread_pool.analyze.threads | Thread pools analyze threads |
node.thread_pool.analyze.rejected | Thread pools analyze rejected |
node.thread_pool.refresh.queue | Thread pools refresh queue |
node.thread_pool.refresh.completed | Thread pools refresh completed |
node.thread_pool.refresh.threads | Thread pools refresh threads |
node.thread_pool.refresh.rejected | Thread pools refresh rejected |
node.thread_pool.generic.queue | Thread pools generic queue |
node.thread_pool.generic.completed | Thread pools generic completed |
node.thread_pool.generic.threads | Thread pools generic threads |
node.thread_pool.generic.rejected | Thread pools generic rejected |
node.thread_pool.flush.queue | Thread pools flush queue |
node.thread_pool.flush.completed | Thread pools flush completed |
node.thread_pool.flush.threads | Thread pools flush threads |
node.thread_pool.flush.rejected | Thread pools flush rejected |
node.thread_pool.write.queue | Thread pools write queue |
node.thread_pool.write.completed | Thread pools write completed |
node.thread_pool.write.threads | Thread pools write threads |
node.thread_pool.write.rejected | Thread pools write rejected |
node.thread_pool.snapshot.queue | Thread pools snapshot queue |
node.thread_pool.snapshot.completed | Thread pools snapshot completed |
node.thread_pool.snapshot.threads | Thread pools snapshot threads |
node.thread_pool.snapshot.rejected | Thread pools snapshot rejected |
node.thread_pool.ccr.queue | Thread pools ccr queue |
node.thread_pool.ccr.completed | Thread pools ccr completed |
node.thread_pool.ccr.threads | Thread pools ccr threads |
node.thread_pool.ccr.rejected | Thread pools ccr rejected |