Adaptive traffic management for Dynatrace Managed
PurePath® distributed traces are end-to-end transactions captured by OneAgent. Each minute, a statistically relevant number of end-to-end distributed traces is captured within each monitored process. Each trace contains code-level and business insights derived from service-level calls to multiple tiers. Because each trace is captured fully and end-to-end, second- and third-level tiers often capture more total service calls than entry-point processes.
When the volume of transactions is high, capturing all traces can increase network bandwidth demands. OneAgent provides a built-in limiter to manage such cases. Each process monitored by OneAgent is allowed to start only a given number of distributed traces per minute. Once the quota is reached, the monitored traffic is used in the most effective way possible via the intelligent mechanism of Adaptive traffic management.
How is Adaptive traffic management different from other sampling mechanisms?
In typical applications, the distribution of requests is not even. It's rather a combination of: a large number of unique URLs, a medium number of important requests, and, finally, a few kinds of requests that make up the majority of the traffic (for example, image requests or status checks).
With Adaptive traffic management, OneAgent first calculates a list of top requests starting each minute, from which it then captures:
- Most traces of unique and rare requests.
- A significant but lower volume of highly frequent requests.
Because the sampling is not random, all important data is captured while maintaining a statistically valid sample set.
You can see the effect of Adaptive traffic management in the distributed trace list. If OneAgent is sampling and not all requests are captured, then captured traces will point out that similar requests have not been captured with the message [amount] more like this
in the distributed trace list.
In this way, OneAgent reduces the data sent to your environment, ensuring that the amount of captured traces stays within the limits of your Dynatrace agreement.
Using Adaptive traffic management to reduce the volume of processed data results in saving a lot of network bandwidth and, in the case of Dynatrace Managed environments, precious CPU, memory, network, and storage resources which would otherwise be required to process and store the additional data.
Quota per process
In Dynatrace Managed, the quota of new distributed traces/min that each process can send to Dynatrace is 1,000. Because traffic management depends on your application architecture, network traffic is limited for high-volume entry points (such as a load balancer or NGINX) and spikes might occur.
Adaptive capture control
You can manage the quota of new entry-point distributed traces captured per minute via Adaptive capture control, both on the environment level and per process or process group.
Adjusting adaptive capture control can help you in specific cases; for example, if a Dynatrace Managed environment for load testing consumes too many network, disk, and CPU resources, you'd instead use those resources for production monitoring. All analyses consider adjustments transparently without affecting service analysis features, except the distributed traces list or metrics.
To manage the quota of new distributed traces/min,
-
Go to Cluster Management Console > Environments and select your environement.
-
optional In the Cluster overload prevention settings, you can set the environment quota of Number of newly monitored entry point traces captured per process/minute. The default value is 1,000, however, the environment quota can be increased to 100,000.
-
Select Go to the environment.
-
In the Dynatrace menu, go to Settings > Server-side service monitoring > Deep monitoring > Adaptive capture control.
-
Select Global or Process group override.
You can reduce or increase the quota, respectively, to reduce the percentage of monitored incoming traffic or to ensure higher fidelity.
If your environment quota is set to 100,000 and you set adaptive capture control to the highest value, OneAgent is effectively instructed to capture all requests, even rare ones, within high-volume environments.
Setting the environment quota and adaptive capture control values too high can cause resource shortages and increase hardware expenditures.
Monitoring
You can use the preset dashboard OneAgent Traces - Adaptive traffic management to track usage and thresholds of Adaptive traffic management.
Adaptive load reduction
Adaptive load reduction is a dynamic mechanism that targets environments with a high volume of traffic compared to their assigned host units. Because Dynatrace Managed environments can process a limited number of service calls per minute (depending on the node CPU amount and memory availability), this is particularly helpful for managing sporadic spikes in the volume of processed distributed traces.
When the amount of service calls that an environment can process is breached, adaptive load reduction is triggered:
-
New incoming distributed traces are skipped in a random fashion, reducing gradually the number of processed distributed traces.
Note that service calls of full distributed traces already in progress are not targeted. -
The number of skipped distributed traces is taken into account to ensure stable statistical validity for all metrics, charts, baselining, and events.
-
You are informed about the reduction of processed data by
- An alert message in the Dynatrace web UI:
Server [amount] activated adaptive load reduction
- A message in the distributed trace list:
[amount] more like this
- An alert message in the Dynatrace web UI:
Adaptive load reduction safeguards your Dynatrace environment from sporadic traffic spikes.
While occasional activation (for example, to cover spikes) will not harm the fidelity of your monitoring data, consistent use for intervals of 15 minutes or longer can impact the accuracy of your monitoring data and metrics because not all data is processed.