Microservices are a great architectural pattern. Not completely new – but – either because of  the name, or just having the right people at the right time promoting it, microservices have gained widespread acceptance. The more software applying this new architectural approach, the more PurePaths I receive through my Share Your PurePath program showing some of the bad .NET implementation patterns. This blog is dedicated to one of my Dynatrace users who sent me a couple of PurePaths and asked me: How can you explain the 1.2s gap between IIS and ASP.NET when processing our REST calls that ome in from your SPA (Single Page App)? With the GAP he was referring to the gap in the Elapsed Time column in the Dynatrace PurePath Tree which indicates when a particular method was executed relative to when the transaction started:

Elapsed Time can be used to understand where you have bottlenecks in your execution, e.g: when passing requests from one queue or tier to another.
Elapsed Time can be used to understand where you have bottlenecks in your execution, e.g: when passing requests from one queue or tier to another.

I’ve seen this type of behavior before and answered similar questions once or twice on our Dynatrace Discussion Forum (Long Handoff between IIS and CLR, High Elapsed Time in PurePath, Async Path in ASP.NET impacting Elapsed Time) as well as blogged about it. After looking at the first PurePath (the one in the screenshot above) I started giving him my standard answer:

  • You may have some native IIS modules (URL Rewrite, Authentication, Compression) configured that are executed before IIS passes the request to your application code
  • You might run low on available ASP.NET worker threads which is why IIS needs to wait until it can pass that request from the initial handler to your application code
  • Please have a look at my answers on the forum

While this answer would have allowed him to go back to his team to find the real root cause,, I simply used the new Automatic Problem Detection feature in Dynatrace AppMon to see what else there is to find in the other PurePaths he sent me. Maybe I could find out whether some bad coding — instead of bad native modules — is responsible for hogging to all these threads.

And voila – two clicks later I found the answer – well – actually two: Some of the requests suffered from the N+1 Web Service and Excessive Threading Pattern. Both detected in the same type of transaction. Meaning: One transaction, that actually turned out to execute rather fast on average (~300ms), is consuming 14 threads while executing 13 external web services in parallel. And, with behavior slowing down other requests that would just need a single thread but can’t get one as all threads are busy executing background external web service calls. If you want to read more about these  and other patterns, and how to detect them, check out my blog on Automatic Problem Detection or watch the latest Performance Clinic for a “live” demo.

But now – let me show you how I analyzed this particular problem and how you can do the same!

Excessive Web Service Calls on Parallel ASP.NET Threads

I mentioned that with the Automatic Problem Detection feature I could easily find the PurePaths that consume all the threads while other PurePaths suffered from the 1.2s gap because no threads were available for them any longer. Here is the screenshot of that feature on his data set. I simply said: Dynatrace, show me those PurePaths that suffer from the N+1 Problem Pattern. Out comes a list of PurePaths and, when I click each of them, I get more details about what else is going on in that path!

Dynatrace automatically shows me those PurePaths that show a certain problem pattern, e.g: N+1 Web Service Calls, Excessive Threading, …
Dynatrace automatically shows me those PurePaths that show a certain problem pattern, e.g: N+1 Web Service Calls, Excessive Threading, …

This ability alone is one of the reasons why I think having 100% transactional coverage is a key requirement to any APM solution, in order to identify the real root cause of performance and scalability problems –especially when we move toward micro service architectures where applications spawn many threads or putting many messages into queues. A single “resource consuming” transaction can have a very fast response time (as in my case, 300ms) but, with its implementation and behavior, will impact other transactions because it consumes scarce resources such as threads, memory, network bandwidth or a connection from a connection pool (database, outgoing net, etc). So remember: 100% End-to-End Tracing across runtime and thread boundaries should be a key requirement when you are looking at tools to help you in your distributed micro-service environments.

The next screenshot is the Transaction Flow that shows us what is going on within these transactions that are executed by the SPA (Single Page App), highlighting where the excessive web service calls come from:

This is the Transaction Flow for a single user interaction on the SPA (Single Page App) all the way through to the backend. Showing the 3 AJAX calls and the 13 backend REST Calls
This is the Transaction Flow for a single user interaction on the SPA (Single Page App) all the way through to the backend. Showing the 3 AJAX calls and the 13 backend REST Calls

From here I typically take two additional steps: I analyze which remote service calls are executed and I also want to find out how they get executed. Drilling to the PurePath Tree and the Web Request Dashlet shows me exactly that most of the remote requests are not only executed against the same end point (=same service) but also with very similar parameters:

Dynatrace shows us all backend web service calls including endpoints and parameters. Easy to spot the classical N+1 Pattern where the same endpoint gets called with different “bind values/parameters”. We can also see how long these threads are blocked
Dynatrace shows us all backend web service calls including endpoints and parameters. Easy to spot the classical N+1 Pattern where the same endpoint gets called with different “bind values/parameters”. We can also see how long these threads are blocked

I can also see that each call is executed on its own background thread – and this is what we learned in the beginning – the actual problem of this application. A rather fast transaction is consuming and blocking precious threads:

Dynatrace PurePath shows us exactly when which thread is involved in executing work, when it starts and how long it takes to finish the job.
Dynatrace PurePath shows us exactly when which thread is involved in executing work, when it starts and how long it takes to finish the job.

To summarize: For every request that the SPA (Single Page App) is sending to that application we have one thread working on the initial request and 13 additional threads that are executing REST calls to an external service. These calls, on average, take 250ms, which means that 13 threads are additionally blocked and can’t be used for new incoming requests. That’s the reason why new incoming requests have to wait until another thread becomes available.

Where in the beginning I wanted to advise them to look into native modules, now I know that their problem is in their own app implementation because of how they call these backend services.

Try it on your own

If you want to try this on your own, simply obtain your own Dynatrace AppMon & UEM Trial. Fill out the form on http://bit.ly/dtpersonal and wait for the email with the activation link. Get started by watching some of the tutorials on my YouTube Channel: http://bit.ly/dttutorials. If you have your own PurePaths that you want to share with me, please feel free to send them over as described here: http://bit.ly/sharepurepath. Or, if you prefer, listen to this: after the initial 30-day trial period the license will automatically convert to a Personal License – which means you can keep Dynatrace with the full-feature sets FOR LIFE on your local dev or test environment. That’s our contribution to making sure that we are all building better micro-service apps from the start.