Adaptive traffic management for Dynatrace SaaS
Dynatrace Full-Stack Monitoring brings value with a variety of features, which include distributed tracing for applications via the patented PurePath® technology. Each monitored application or microservice is constantly monitored and produces distributed traces, containing code-level and business insights, that are sent to Dynatrace.
Depending on the number of the application transactions and on the host units consumed by your environment, OneAgent captures a certain number of end-to-end traces per minute. When the volume of transactions is high, the amount of traces that could be captured by OneAgent might exceed the amount of trace volume available in your environment based on currently active host units.
Once the quota is reached, OneAgent starts sampling in the most effective way possible, via the intelligent mechanism of adaptive traffic management. The resulting capture rate is defined as OneAgent capture rate.
How is adaptive traffic management different from other sampling mechanisms?
In typical applications, the distribution of requests is not even. It's rather a combination of: a large number of unique URLs, a medium number of important requests, and, finally, a few kinds of requests that make up the majority of the traffic (for example, image requests or status checks).
With adaptive traffic management, OneAgent first calculates a list of top requests starting each minute, from which it then captures:
- Most traces of unique and rare requests.
- A significant but lower volume of highly frequent requests.
In this way, OneAgent reduces the data sent to your environment, ensuring that the amount of captured traces stays within the host-unit limits of your Dynatrace agreement. Because the sampling is not random, all important data is captured while maintaining a statistically valid sample set.
The following table represents a top-request calculation example, along with the respective capture rates.
|Request||Number of requests processed by the application||Capture factor||Captured distributed traces|
…50 other URIs
In this example, a bit more than 1,000 requests/min are captured by OneAgent, accordingly to the configured target number of request. Depending on the capture factor, URIs are captured each time (URIs C, D, and 50 other URIs) or only 50% of the time (URIs A and B). In this last case, requests are traced end-to-end by OneAgent over 600 times/minute.
You can see the effect of adaptive traffic management in the distributed trace list. If OneAgent is sampling and not all requests are captured, then captured traces will point out that similar requests have not been captured with the message
[amount] more like this in the distributed trace list.
Using adaptive traffic management to reduce the volume of processed data results in saving a lot of network bandwidth.
In Dynatrace SaaS, traffic management depends on the environment quota of allowed full-service call volume per minute. A single distributed trace can contain multiple full-service calls. The maximum amount of full-service calls per minute–and therefore traces per minute–that your environment can receive scales with your license as it's based on the amount of host units that are active in your environment.
Allowed full-service calls volume/minute = 250 full-service calls x active host units
This quota is maintained on the environment level and is shared across all monitored applications. In a sense, low-volume applications share their unused transaction volume with high-volume applications that need it.
Example: A moderate environment of 50 hosts with 32 GB each (= 100 host units) can process up to 25,000 full-service calls per minute.
Monitor adaptive traffic usage and thresholds
You can use the preset dashboard OneAgent Traces - Adaptive traffic management to track usage and thresholds of adaptive traffic management. Metric and charts provide insights into:
- Full-service calls per host unit
- Captured full-service calls
- OneAgent capture rate
The short answer is, not at all.
The shaping of traffic is accounted for transparently and done in a way that ensures statistical validity while capturing rare requests with high probability. All charts show the total real number of requests that your application processes, as does all ad-hoc analysis you might perform. Dynatrace AI is not impacted by this, nor is alerting. You will not see a difference in charts or service call analysis data unless you're looking at a single distributed trace. The only place where this traffic shaping is visible is in the distributed traces list, which displays a message like
[number of traces] more like this.
- Full-service call
Server side call that starts: a distributed trace, a service call at a deep monitored tier, or a custom service call.
All requests for web request services and web services (except for external ones), RMI services, messaging services and custom services are full-service calls.
External calls (such as database calls, external web requests, or generally any opaque service call) are not full-service calls, and so aren't counted against your traffic limit.
The minimum number of full-service calls per minute in a given environment is 5,000 (the equivalent of 20 host units). Each process can start between 50 and 50,000 full-service calls per minute.
- Active host units
Host units currently in use and connected to the environment (not the host units assigned to the environment).