Adaptive traffic management for Dynatrace Managed
PurePath® distributed traces are end-to-end transactions captured by OneAgent. Each minute, a statistically relevant number of end-to-end distributed traces is captured within each monitored process. Each trace contains code-level and business insights derived from service-level calls to multiple tiers. Because each trace is captured fully and end-to-end, second- and third-level tiers often capture more total service calls than entry-point processes.
When the volume of transactions is high, capturing all traces can increase network bandwidth demands. OneAgent provides a built-in limiter to manage such cases. Each process monitored by OneAgent is allowed to start only a given number of distributed traces per minute. Once the quota is reached, the monitored traffic is used in the most effective way possible via the intelligent mechanism of Adaptive traffic management.
How is Adaptive traffic management different from other sampling mechanisms?
In typical applications, the distribution of requests is not even. It's rather a combination of: a large number of unique URLs, a medium number of important requests, and, finally, a few kinds of requests that make up the majority of the traffic (for example, image requests or status checks).
With Adaptive traffic management, OneAgent first calculates a list of top requests starting each minute, from which it then captures:
- Most traces of unique and rare requests.
- A significant but lower volume of highly frequent requests.
Because the sampling is not random, all important data is captured while maintaining a statistically valid sample set.
You can see the effect of Adaptive traffic management in the distributed trace list. If OneAgent is sampling and not all requests are captured, then captured traces will point out that similar requests have not been captured with the message [amount] more like this
in the distributed trace list.
In this way, OneAgent reduces the data sent to your environment, ensuring that the amount of captured traces stays within the limits of your Dynatrace agreement.
Using Adaptive traffic management to reduce the volume of processed data results in saving a lot of network bandwidth and, in the case of Dynatrace Managed environments, precious CPU, memory, network, and storage resources which would otherwise be required to process and store the additional data.
Quota per process
In Dynatrace Managed, the quota of new distributed traces/min that each process can send to Dynatrace is 1,000. Because traffic management depends on your application architecture, network traffic is limited for high-volume entry points (such as a load balancer or NGINX) and spikes might occur.
Adaptive capture control
You can manage the quota of new entry-point distributed traces captured per minute in one of two ways:
-
Environment-wide
- You can reduce the environment quota, and, thereby, the percentage of monitored incoming traffic.
- You can increase the environment quota up to 100,000 to ensure higher fidelity.
This effectively instructs OneAgent to capture all requests, even rare ones, within high volume environments.
Setting this value too high can cause a resource shortage and increase hardware expenditures.
-
For each process or process group
Note that environment administrators can additionally modify the environment quota.Go to Settings > Server-side service monitoring > Deep monitoring.
Adjusting this setting can help you in specific cases, for example, if a Dynatrace Managed environment for load testing is consuming too many network, disk, and CPU resources, and you'd rather use those resources for production monitoring. Adjustments to settings are taken into account transparently in all analyses, without affecting service analysis features, except the distributed traces list, or metrics.
Monitor adaptive traffic usage and thresholds
You can use the preset dashboard OneAgent Traces - Adaptive traffic management to track usage and thresholds of Adaptive traffic management.
Adaptive load reduction
Adaptive load reduction is a dynamic mechanism that targets environments with a high volume of traffic compared to their assigned host units. Because Dynatrace Managed environments can process a limited number of service calls per minute (depending on the node CPU amount and memory availability), this is particularly helpful for managing sporadic spikes in the volume of processed distributed traces.
When the amount of service calls that an environment can process is breached, adaptive load reduction is triggered:
-
New incoming distributed traces are skipped in a random fashion, reducing gradually the number of processed distributed traces.
Note that service calls of full distributed traces already in progress are not targeted. -
The number of skipped distributed traces is taken into account to ensure stable statistical validity for all metrics, charts, baselining, and events.
-
You are informed about the reduction of processed data by
- An alert message in the Dynatrace web UI:
Server [amount] activated adaptive load reduction
- A message in the distributed trace list:
[amount] more like this
- An alert message in the Dynatrace web UI:
Adaptive load reduction safeguards your Dynatrace environment from sporadic traffic spikes.
While occasional activation (for example, to cover spikes) will not harm the fidelity of your monitoring data, consistent use for intervals of 15 minutes or longer can impact the accuracy of your monitoring data and metrics because not all data is processed.