On November 12th 2014, DoubleClick started having an issue delivering Ads.  This was seen by Dynatrace’s Outage Analyzer, a Big Data application which captures millions of domain requests from tens of thousand of tests run from the Dynatrace Synthetic Network.

Dynatrace Outage Analyzer showing DoubleClick event and impact
Dynatrace Outage Analyzer showing DoubleClick event and impact

The issue was seen across almost every industry vertical that Dynatrace monitors (Automotive, Social Networking, Travel, Digital TV, Sports, News, Financial Advice, Retail, etc…).  While some industries were impacted more than others due to their exposure to DoubleClick ad services, the pervasive nature of delivering ads to generate revenue from ad impressions can be seen across industry verticals.

Dynatrace Synthetic Benchmarks showing impact across industry verticals
Dynatrace Synthetic Benchmarks showing impact across industry verticals

While reports started to feed in from Social Media (Twitter) and News Media about the issue.  Using Dynatrace, I found that over 3000 websites were impacted by this issue and that the fault stemmed back to a problem within Doubleclick’s own infrastructure.

Dynatrace Synthetic’s Root Cause Analyzer (RCA) showing issue identified as being caused by DoubleClick calls
Dynatrace Synthetic’s Root Cause Analyzer (RCA) showing issue identified as being caused by DoubleClick calls

Dynatrace Synthetic’s Root Cause Analyzer (RCA) identified the issue immediately, and allowed our customers to understand that the issue was not a malicious attack on DoubleClick’s network.

Dynatrace Synthetic Object Water Fall Chart showing DoubleClick error detail

Dynatrace Synthetics found the issue to be 502 errors coming from requests for DoubleClick ads.  This tells us that it was not a public internet issue (No DNS problems, no routing problems, etc…) but an issue with DoubleClick at the application level.

So, How did this impact individual companies who use Doubleclick?  DoubleClick is an Online Ad Network.  Online Ad revenue is a substantial source of hard revenue for online media companies and advertisers.  Any impact to Ad networks (Doubleclick being the largest) leads to a substantial loss of potential revenue as ad impressions are not happening.   An issue like this that also slows sites is doubly impactful as users will get frustrated with slow pages and abandon the site they are on, leading to an additional loss of ad impressions.

During this event Dynatrace Real User Monitoring was running on a customer’s site and captured the real user impact of this event.  Here we can see that during the event, the perceived render time of the pages were not impacted that much, however the total page load time was impacted (pretty significantly).

Dynatrace Real User Monitoring showing traffic metrics during DoubleClick outage event
Dynatrace Real User Monitoring showing traffic metrics during DoubleClick outage event

When we look at the third parties from a Real User perspective we can see a dramatic increase of occurrences when DoubleClick requests became some of the slowest user actions on the site and the impact that it had on all of the user actions which called DoubleClick content.

Dynatrace Real User Monitoring Third Party View showing DoubleClick impact on real user page actions
Dynatrace Real User Monitoring Third Party View showing DoubleClick impact on real user page actions

Here is the “so what”.  If a company’s main source of revenue was coming from online ads, can they be more proactive in how they manage their most important source of revenue?  The answer is, yes, absolutely they can.   By using Synthetic and Real User Monitoring, organizations can be much more proactive in how the get notified and how they react to an event like this.  For example, in this case a Dynatrace Incident could have been attached to a business transaction monitoring the company’s most important third party (ad network).  That incident could have proactively notified a company that an issue is underway.   A company that was proactively managing this could switch their ads to another ad network (like Adwords, Yahoo, AdCenter, Zedo, Infolinks, 24/7, etc…).  By switching to another ad network the company could protect its online ad revenue as opposed to loosing hundreds of thousands dollars (if not more) by not reacting to the event.

Having procedures in place to switch traffic from one ad network to another is a good way of reducing risk to an outage like this, however making sure that you are aware of the issue is how you know when to trigger those procedures to protect the most important aspects of your site.