IT Operations and Digital Disruption

One of the consequences of digital disruption is that IT is propelled much closer to users, who expect applications and services to be available and to perform well anytime, anywhere, on any device. Communicating with the business now more than ever requires communicating with these users, and effective communication requires a clear understanding of their experience with the application services you deliver. But how well do you understand end-user experience? To answer that question, let’s first define end-user experience as the transaction response time a user receives from an application service; click to glass seems to be the term in vogue. Alternatively, you can look at the question from a different perspective: How is the quality of your application services measured? Are you driven primarily by infrastructure and application performance metrics, or by end-user experience?

Whether your data center supports customer facing applications that directly generate revenue, or enterprise applications that automate and manage critical business processes – or, increasingly, both – application performance matters. It is the way your service quality is perceived by your users – consumers or employees. Business owners understand this intrinsically; poor ecommerce site performance results in lost revenue and damaged loyalty, while internally, poor application performance leads to decreased productivity and lost business opportunities.

APM’s Three Aspects

Application performance monitoring can be segmented into three aspects, or disciplines:

  • Device monitoring provides critical visibility into the health of the infrastructure components – the servers, disks, switches, routers, firewalls, desktops, etc. required to deliver application services.
  • Application monitoring provides visibility into critical application components, such as application containers, methods, databases, APIs, etc.
  • End-user experience monitoring, within its performance context, provides visibility into user productivity, and provides a top-down business-centric perspective of both device and application performance.

It’s clear that the first two aspects – device and application monitoring – are fundamentally important. We should also be able to agree that, to answer our earlier question about service quality, you must also measure the end-user’s experience. When users complain, they speak in terms of transaction speed, not network latency, interface utilization, database queries, java server pages, or CPU speed. (Well, if you’re on the network team, you’ll claim they often will say the network is slow, but we generally know better.)

The user’s experience can be considered the intersection between business metrics (productivity) and IT metrics (device and application); it’s the one metric both groups have in common.

aspects-of-apm

Most of us are pretty good at device and application monitoring; this is often not the case when we consider end-user experience monitoring. So what are the penalties if you’re not measuring end-user experience?

  • You won’t know users are having a problem until they call you (unless something catastrophic happens).
  • You will chase after problems that don’t affect users (because you’re monitoring dozens, or hundreds, of metrics of varying impact).
  • You won’t have a description of the problem that matches your metrics (and therefore don’t have a validated starting point for troubleshooting).
  • You won’t know when or if you’ve resolved the problem (without asking the users).

At Cisco Live this week, an IT manager told me of his frustration with all-too-frequent 3 a.m. infrastructure or application alerts: should he get up and investigate? He had no idea if the problem of the moment had any impact on users. Only by adopting end-user experience monitoring was he able to qualify and prioritize his response.

Don’t We Already Measure End-User Experience?

It’s true that many applications– particularly those based on Java and .NET platforms – may already be instrumented with APM agents, some of which provide exactly this insight into end-user experience. However:

  • These APM solutions are often not used by operations teams
  • Not all Java and .NET apps will be instrumented (and if you’re not using Dynatrace, you might only be sampling transactions)
  • Many application architectures don’t lend themselves to agent-based instrumentation

IT operations teams therefore usually rely on more traditional infrastructure-centric device and network monitoring solutions. The rise of application-awareness (primarily in Application Aware Network Performance Monitoring – AA NPM – solutions, but also in device management offerings) has given IT varying degrees of insight into application behavior – and sometimes a degree of insight into application performance. However, without visibility into end-user experience, without a user transaction-centric starting point, these tools do little to foster the communication and collaboration we mentioned earlier. As I pointed out in a recent webcast called the Top Five Benefits of Transaction-Centric NPM, AA NPM solutions are generally quite limited in their ability to measure actual end-user experience, especially across a broad range of application architectures. Instead, these tools use key infrastructure measurements such as network latency, packet loss, jitter, etc., as indicators or hints of application performance as experienced by end users. These metrics may be quite meaningful to IT specialists, but they aren’t end-user experience and don’t provide a basis for effective communication.

Transaction Performance Monitoring

Just as we focused the definition of end-user experience towards its performance context, we need to define the term transaction to understand its relevance. Our interest is the alignment between the transaction measurements performed by the monitoring solution with the definition of end-user experience. I covered this in some detail in the webcast, but here’s a quick summary:

Transaction type Measures Good for
Ping Network round-trip delay Network delay
Session layer response time Request/response delay Lots of hard-to-interpret statistics
Application component response time Component-level performance (e.g., jsp, aspx, image request) Identifying slow application components for development team investigation
User transaction (EUE) Click to glass Understanding your usersCommunicating with business teams

Most AA NPM solutions will offer session-layer response time as an application performance metric, categorizing performance measurements based on traffic classification approaches such as NBAR. User-level transaction monitoring (I refer to this as Transaction-Centric NPM) – essentially, end-user experience – from a probe-based solution is accomplished via sophisticated deep packet inspection (DPI) algorithms and heuristics specific to the application protocol. Important examples include HTTP/S, of course, but also SAPGUI, Oracle Forms, Microsoft Exchange, XML/SOAP, WebSphere MQ, various database protocols, etc. Each of these protocols delivers transactions to end users using different approaches to encoding critical details such as transaction identification, request parameters, response payload and user information. By virtue of a wide range of protocol support, a transaction-centric probe-based solution can provide a consistent business-centric view into end-user experience across a broad portfolio of applications.

Consider this session-layer response time view into a mission-critical SAP application:

session-layer-response

Since transaction types cannot be identified, all transactions (46.7K in this example) are treated equally. Average response time is 677 ms., but this metric includes an unknown mix of transactions, some of which might complete in a few milliseconds, others that might take dozens of seconds. As a result, there is very little insight available from this metric.

A transaction-centric approach separates transactions into different measurement buckets; no longer is a 25-second report averaged with hundreds of 20 millisecond queries. The business context becomes immediately clear, along with transaction-specific performance breakdowns that immediately isolate the fault domain for the failing transaction:

business-context

BizOps is the Goal

Alignment with your business peers requires a common language and shared priorities. (BizOps isn’t a word, and shouldn’t become one. But it stuck in my head, so this is my attempt to get it out.) This doesn’t mean that business learns ITs language, but rather the opposite; IT must learn the language of the business. As we’ve seen, understanding end-user experience as a key metric of business productivity provides the foundation for this communication. This perspective becomes the means to translate IT metrics into business metrics, enabling mutually understood goals and decisions.  To accomplish this, operations teams need to gain visibility into end-user experience that traditionally doesn’t exist in the AA NPM world. This transaction-centric approach is a core differentiating capability of Dynatrace’s Data Center RUM.

Two Notes from Cisco Live

I’ll close with two quick observations from Cisco Live this week. First, I counted at least 12 vendors that touted application performance and/or end-user experience capabilities – a testament to the potential value, but also a warning to pay attention to how these terms are defined. Second, there were quite a few visitors I spoke with that really understood the value of end-user experience, of a transaction-centric approach to network performance monitoring. I found this a significant departure from just a couple of years ago, anecdotal evidence of the trend within network operations towards responsibility for application delivery.