Tracing problems in Project Stonehenge and other heterogeneous systems

In my previous blog post I discussed that interoperability is not just about letting systems talk with each other. Having the technology to connect different platforms is only the first step. One major problem in heterogeneous systems is the lack of a common set of tools that enable you to easily find the root cause of a problem occuring in a distributed transaction that even crosses platform boundaries (Java and .NET).

The current version of Project Stonehenge allows me to show you how to use the Dynatrace Platform to analyze the root cause and impact of problems from a transactional perspective.

Transactional Tracing with Dynatrace on Project Stonehenge

Stonehenge is a reference implementation of a Stocktrader application. The application has the following tiers:

  • Frontend (Implemented with ASP.NET and PHP)
  • Business Services (Implemented with ASP.NET, Java and PHP)
  • Order Processing Services (Implemented with ASP.NET, Java and PHP)

The Frontend component talks with the Business Service to query the current status of the user’s portfolio and get stock quotes. When selling or buying new stocks, the Business Service talks with the Order Processing Service which asynchronously handles the request. Both Business and Order Processing are implemented in .NET and Java (I will leave PHP out here but it is also possible to talk with a PHP implementation). Via configuration files it is possible to configure which version of the implementation (Java or .NET) to be used.

Step 1: Define a System Profile

The first step with Dynatrace is to define a System Profile describing the system to analyze. I end up with 4 different Agent Groups – which represent the logical grouping of the services provided by the Stonehenge project:

  • Frontend (hosted in IIS)
  • Business Service for .NET (hosted in standalone application)
  • Order Processing for .NET (hosted in standalone application)
  • Stock Trader Services for Java (hosted in WSO2 Web Application Server)

I only create one configuration for the Java implementation because the default deployment of Stonehenge hosts both services in on single WSO2 Web Application Server Instance.

Step 2: Executing transactions with different tier-to-tier communication configurations

I am starting Stonehenge with different interoperability configuration options, e.g.: letting the Business Services for .NET talk to Order Processing for .NET or letting it talk to the Order Processing implemented in Java. The individual transactions are picked up by Dynatrace and I can then visualize each individual PurePath (which is the representation of a single distributed transaction) in different ways.

Visualizing the transaction flow across Dynatrace Agents:

Transaction Flow from .NET Business to Java Order Processing
Transaction Flow from .NET Business to Java Order Processing

Transaction Flow from .NET Business to .NET Order Processing
Transaction Flow from .NET Business to .NET Order Processing

Step 3: Analyzing service interactions

A major aspect of distributed and service oriented applications is to look at the actual interactions between services or components. Calling remote services has become an easy task with the great support of frameworks and IDE’s. This “convenience” also brings problems with it – problems that are not visible unless you analyze what is really going on in the context of an executed transaction. dynaTrace can visualize a single transaction in a sequence diagram. There are different zooming options available. For our purpose we are interested in the actual remote communication interactions between two service instances.

This is a visualized sequence diagram of the service interactions between the frontend and the trader service:

Remoting Interactions between Business and Order Service for a single web request
Remoting Interactions between Frontend and Business Service for a single web request

Seems there are many roundtrips for the web page request that displays my current portfolio. The Frontend has to make 10 roundtrips to the Business Service.

Step 4: Analyze Application Layers

A distributed application is made up of services – but it is made up of even more application layers. There is a layer responsible for data access, a layer that implements the business logic, a layer that offers remote communication and a layer that implements the frontend visualization. Dynatrace automatically identifies application layers and is able to visualize in which Layers (dynaTrace calls APIs) most of the time is spent:

Performance breakdown in application layers
Performance breakdown in application layers

You can see from the API Breakdown view that Dynatrace analyzes all different layers of all different components (Java and .NET). The layers include out-of-the-box identified layers like .NET WCF, Servlet, ADO.NET and JDBC. It also contains custom defined layers like Stonehenge or MSTrade which include the custom business logic code of the Java and .NET Implementation of the Stocktrader application. This breakdown view allows me to quickly analyze where most of the time is spent. I can either analyze this layer breakdown for a set of transactions or for individual transactions.

Step 5: Analyze transaction flow and identify root cause of problems

PurePath technology traces every single transaction that is executed against Stonehenge from the frontend through all involved services (Java or .NET). A transaction in our case starts at the ASP.NET Frontend Service when the end user browses through the pages.

PurePath showing execution path of distributed transaction
PurePath showing execution path of distributed transaction

The PurePath in the screenshot shows us the complete execution path of an order transaction. The transaction starts at the ASP.NET Frontend which calls to the Business Service via WCF. This service inserts a record to the database and then calls the Order Service via WCF which is updating account, quote and order information. The color-code of the individual methods indicates the overall performance contribution of the individual method to the overall transaction. On each of the nodes in the PurePath we can see additional information that provides more context information. This could be HTTP Parameters and HTTP Session Information on the ProcessRequest node or the name of the WCF Contract Method that was invoked via WCF. For every method we also get the time it took to execute, the time actually spent on the CPU, time spent in synchronization or wait blocks and time the method was suspended by the Garbage Collector.

Seeing the complete execution trace of a transaction allows us to identify which components have actually been called in order to fulfill a request and where time was spent. It actually seems that the guys at Microsoft and Sun built a nice “waiting” method into their implementation to better simulate “under load” processing time when stock selling and purchasing orders come in. With Dynatrace I can immediately see the hot path of a single transaction which brings me to a method that takes most of the time when handling the order:

Order processing on the java side has a built-in wait method
Order processing on the java side has a built-in wait method

Step 6: Analyzing configuration issues

Particularly in distributed systems, the configuration of the interoperability layer can be a problem. You may end up calling wrong service endpoints or the configuration lacks of missing values. The following PurePath shows “hidden” exceptions that are not logged out to log files indicating a configuration problem. The problem itself may not manifest itself as a functional problem – but it definitely produces overhead due to exception handling.

Exception thrown when evaluating interoperability configuration
Exception thrown when evaluating interoperability configuration


Interoperability is a great thing – especially when the two major players in the field (Microsoft and Sun) provide the technical base for connecting Java and .NET components. Working with a heterogeneous system and dealing with the day-to-day issues is a problem that a solution like Dynatrace solves by providing:

Andreas Grabner has 20+ years of experience as a software developer, tester and architect and is an advocate for high-performing cloud scale applications. He is a regular contributor to the DevOps community, a frequent speaker at technology conferences and regularly publishes articles on You can follow him on Twitter: @grabnerandi