software architects made a strategic decision to develop their SaaS application time cockpit, the semi-automatic time tracking solution for service professionals, on Microsoft’s Windows Azure platform. The promise of Microsoft is to have a highly scalable platform with full cost control. In order to ensure that the investment in Windows Azure is paying off they require:

  • 100% client-side visibility to identify any end-user problems in their Silverlight client.
  • 100% end-to-end visibility to analyze and fix problems proactively in their client or server-side code
  • Tenant-based resource monitoring (SQL Azure, Storage Services …) to avoid nontransparent operational costs and fraud.

In this blog we share how the time cockpit team successfully implemented their offering on Windows Azure and how Compuware’s dynaTrace deep application transaction management solution allows them to manage their application and business in the Cloud.

Architecture

Let’s start with an architectural overview. time cockpit is based on a classic three-tier architecture: the user interface is implemented in Silverlight and the application as REST services running redundantly on two WebRoles which stores the data in the SQL Azure database.

Classic three tier architecture of time cockpit
Classic three tier architecture of time cockpit

time cockpit is also available as a rich client—alternatively to the Silverlight client—that directly connects to the Windows Azure database SQL Azure.

Requirement #1: 100% End-User Visibility

One of the most exciting APM benefits for software architects was insight into their Silverlight-based client with full end-to-end visibility into Windows Azure and Azure Storage Services. Having end-user visibility means:

  • Know where our users are from
  • What environment they use (browser, browser version, OS)
  • What performance experience they have
  • Which application components contribute to performance when problems occur

The dynaTrace JavaScript Agent Development Kit also allowed them to implement client-side logging in their Silverlight application. Now, they capture 100% of all user errors and exceptions and proactively monitor them on a dashboard.

Flow through the time cockpit Silverlight client; this user action triggers three web requests (all are collapsed), the last one returns with an error
Flow through the time cockpit Silverlight client; this user action triggers three web requests (all are collapsed), the last one returns with an error

They also have the full transactional context—Silverlight client plus complete server side—for each single transaction. This allows them to rapidly pinpoint the root cause of any functional or performance problem. Furthermore, they can assess the severity of each error and find out what part of their customer base is affected by it, which is the next story.

Requirement #2: Severity Assessment for Customer Claims and Proactive Monitoring

Let’s assume the following situation: we have a customer calling and reporting an error. Of course we want to rapidly identify and fix the error. But wouldn’t it be great if we could also tell which other customers are facing the same error? We know that many users just ignore an error and only few take the effort and call; especially if you have them trying your new beta version.

This was exactly the case when software architects made available time cockpit’s latest beta version. With both end-user and deep transaction visibility, they identified the root cause within a few minutes. Furthermore, they could exactly pinpoint the customer base for whom the specific error occurred and thus assess the severity of that specific customer claim. To proactively monitor and instantly recognize this situation in the future, they included a traffic light indicating the error occurrence on their monitoring dashboard for the beta program.

Requirement #3: Resource Monitoring per Tenant

With an APM solution capturing every single transaction and a business context analysis on top, we are able to group transactions by tenants. This allows us to exactly identify what resources and to what extent a particular tenant consumes, which further enables us to recognize if we’ve implemented our features efficiently and to detect if fair use policies are exceeded (fraud detection). In combination with dynaTrace smart baselining we automatically get notified if such situations occur.

For time cockpit, they built a dashboard to monitor overall database size, database size per tenant and CPU utilization per tenant. The dashboard shows right away that one customer is about to cross the red line and three are close to triggering a warning. The middle chart on the right shows that the pink customer consumes about one third of all CPU; the chart below illustrates time-based CPU utilization split by tenant. Thus, we become aware not only of general peak hours but furthermore of tenant specific peak hours.

Tenant-based resource usage: overall database size is fine, four tenants stick out, and one of them is close to violating capacity
Tenant-based resource usage: overall database size is fine, four tenants stick out, and one of them is close to violating capacity

Many more ratios are feasible: number of storage calls or Service Bus calls, if you have cost driving Azure resources in mind, but also any other resources can be monitored this way.

If you want to read more about the background of cost management in public clouds, I’d like to invite you to read this blog.

Summary

Organizations make use of public cloud offerings because they want to focus on their revenue-generating application. And from there we already can draw the significance of Application Performance Management in public cloud environments. If we focus on our application, let’s also focus on its performance. Come back to read more about how a state-of-the-art APM solution can help you in public cloud environments.