Third-Party Issues and the Performance Ripple Effect

On September 10, GoDaddy, a major Internet Registry and DNS provider for millions of domain names experienced an outage that caused a substantial impact for companies and people around the globe while major online sites in the US experienced no or minimal performance impact at all from this event.

How can this be? How can a major online name registry and DNS provider go offline for 4-5 hours and only have an indirect or minimal impact on some of the largest online companies in the US?

For large online companies, control and management of DNS is seen as a critical part of their infrastructure and they either maintain direct control of their DNS namespace or contract its management out to well-known third parties such as CDNs and specialized global DNS providers. As a result, the primary domain names for these firms were less likely to be affected by the loss of DNS servers.

Smaller businesses, groups and individuals who do not have the resources to either hire people or contract with firms to take over their DNS Management rely on the domain registries or hosting providers they have selected to provide this service for them. For a firm that is doing business or delivering third-party services to other to companies, losing DNS service is the same as being thrown off the internet, if only for a short time.

The outage at GoDaddy rippled across the internet this week had an impact on thousands (potentially millions) of individuals, small businesses, and third-party service providers.

The protection that many of the top online businesses built into their infrastructure to prevent such a catastrophe from affecting them didn’t leave their applications completely unaffected. The interconnected nature of modern online applications, with the increasing reliance on third-party services providers, from CDNs to customer service surveys, means that protecting your DNS doesn’t protect you from an outage that someone else suffers.

Data collected for the Gomez US Retail Product Order, Retail Homepage, and Top Traffic benchmarks shows three key trends during the outage:

  1. Overall aggregated response times were elevated
  2. Aggregated DNS Lookup times increased by 3X to 4X
  3. Variability of DNS results increased from less than 0.4 seconds to nearly 4.0 seconds at the peak of the issue.
US Retail Benchmarks - September 10 2012
Response Time, Availability, and DNS Metrics for September 10 2012

But not all firms were affected equally. Some firms saw a much larger effect on performance as tracked by the same metrics. In this graph, isolating the aggregated metrics for some of the most affected firms shows that availability, response times, DNS times, and DNS variability were substantially higher than the entire population of Benchmarks.

Most Affected US Retail Sited - September 10 2012
Most Affected firms – DNS Outage – September 10 2012

The selection of third-party providers appears to have been critical in this instance. Firms that used vendors that relied on the affected DNS provider suffered a far greater performance effect than those who used other firms delivering the same service. This points out that there are new concerns that companies need to consider when selecting third parties and when companies delivering these third-party services design their infrastructure.

The two primary takeaways from this event are:

  1. Understand the true impact of third-party content on performance and user experience. Knowing whether a third party service outage will be an annoying distraction or it is could shut the application down, with a matching loss in revenue and brand value, is critical for an organization to understand.
  2. A redundant and scalable infrastructure design is a critical competitive differentiator when selecting third-party content providers. Third-party services that can clearly demonstrate to that steps have been taken to prevent their service from causing performance degradation or application outages will show that they want to be a partner in the success of their customer’s business.

While discussions on the root cause of this outage may continue for months, it has clearly highlighted that the failure of a critical internet resources can ripple outward, affecting even those firms who would, at first glance, to have taken steps to protect themselves from just such an occurrence.