Complexity is the new reality of web and mobile applications with almost no new release going out without the addition of services and applications spread across many different companies. But the reality of this new interrelationship is still the same: If a third party internet outage or issue occurs,, your brand is the one that is affected.
With up to 1,500 distinct third-party services available to choose from around the world, it is sometimes difficult to even identify what a service does when it appears in your applications. This forces your team to not only be fully aware of the components you control, but also to be able to follow the trail of services that extends far outside the code and systems your company manages when issues appear.
Using Compuware Outage Analyzer data, it is now easier to open a window to these services, seeing data collected across all companies and tests to extract patterns that are sometimes hard to find if data is examined on an individual customer basis.
But what is an outage? Well, it means different things to different people. When used in the media, it is sometimes easy to assume that “outage” means that all aspects of the service are unavailable from all places. These Full Service Outages are dramatic but are over-shadowed in number by Partial Service Outages, those that affected either a small geographic area or a few application transactions containing the third-party host call.
In the five months of data studied here (November 2012 through March 2013), Compuware Outage Analyzer tracked 1,413 Full Service Outages and 7,969 Partial Service Outages. So while Full Service Outages may get the most press, a Partial Service Outage is more likely to occur and more likely to affect individual web and mobile applications while leaving other service customer (including your competitors in some cases) completely untouched.
In this day of ever-present monitoring, can a complete service outage go undetected for more than a few hours? The answer is yes. On February 1 2013, the ad-serving company adBrite completely halted operations, an event that was announced directly to customers and tracked in the tech media.
It should have come as no surprise that the service actually did cease operations on February 1. But during early February, Outage Analyzer tracked a sudden increase in ad-serving outages. When the data was examined more closely, especially by separating Partial from Full Service Outages, half of the Full Service Outages detected in February originated from two adBrite hostnames.
Drilling into the data deeper indicated that over 200 Dynatrace tests, including a large number of sites that appear on the US Media Benchmarks, continued to contain calls to the adBrite network after it had ceased operations. This poses three potential scenarios
- Did companies not know that adBrite was halting operations and just left these calls in place?
- Was information of the adBrite shutdown known to some in the company, but not communicated to the people who could remove the hostname from the existing application code?
- Or did companies not even know that there were calls to adBrite hostnames in their applications?
The answer will likely vary from company to company. But not closely tracking all aspects of your online applications can lead to service-level blindness such as was seen during this period.
It’s no longer enough to fully understand the third-party services that are included in your site. You now have to have a plan to respond to performance issues and outages that occur with these services so that your customer experience is not affected. Having a Plan B is just not enough – a Plan C, D, E, or even K may be needed to determine how to respond when one of these services has a problem.