I love the outcry that surrounds a digital service outage. It’s almost as though someone has turned off the oxygen for a few moments. So, it comes as no surprise that everyone went a bit crazy when Whatsapp went down for a few hours on Wednesday night. The usual scenario played out – people turned to social media to vent, and to laugh, and media were on the hunt for some useful commentary.
Given our expertise and leadership in the application monitoring space, I was able to shed some light on what might have happened and why we should all find some sympathy for Whatsapp.
Here’s some more context that media didn’t cover.
1. More than a billion users expect Whatsapp to be 100% perfect, all the time.
Digital businesses face the most challenging task today, and it gives more complex everyday. One in seven people on the planet use Whatsapp in high volume and expect it to be highly secure, constantly updated with new features, bug-free and always on.
Doesn’t matter that it’s a free service – we all still expect 100% perfection. No pressure.
2. Rapid release cycles + surging consumer expectation = constant innovation.
— Ross Hammond (@RossGraphitas) May 3, 2017
It’s widely known now that Amazon is releasing new software updates every 11 seconds, and it’s thanks to these heavyweights that the application performance benchmark keeps shifting higher.
It doesn’t matter how big the Whatsapp development team is, consumers expect the chat service to keep a similar pace to Amazon when it comes to the rate of innovation and continuity in app performance.
And Whatsapp is doing amazing things – I can see that nearly every week they’re pushing a new version of the app to the stores. What this probably tells us is, that they are implementing back end updates in minute or hourly increments. The whole DevOps process is quite remarkable – it never stops and yet one small break in the chain and the whole world knows about it.
3. Fail-fast philosophies and speedy problem resolution are non-negotiable.
Pushing new updates through the development and production cycle is always risky, which is why testing and monitoring how changes will impact an app’s performance is so important.
At the first sign of a problem, today’s DevOps teams need to be able to take swift action and roll back and fix, or abandon, especially if users will be impacted or if an outage is imminent. It’s a process that needs to be automated and backed with AI so that there’s never any doubt where the issues lie (even if in lines of code) and how they are remediated.
All businesses face the same risk.
It’s not hard to recall a time when Google, Microsoft, Netflix, major bank or government organisation experienced a hiccup in service and application performance, thanks to changes in the back end.
The reality is these digital ecosystems are hypercomplex. Even the best development and production teams in the world, with the best tools on-hand, can still suffer performance problems. It’s how you prepare for or mitigate failure, that matters.
But there is a lighter side to the story too.
It’s always a bit of fun watching the light-hearted, entertaining Tweets do the rounds during an outage.
Here are some of my favourites from this week:
— Suresh Pilania (@suresh_pilania) May 3, 2017
— Miguel 💎 (@MigueMontenegr0) May 3, 2017
Turn off WiFi, turn on WiFi, turn off WiFi, turn on WiFi
Opening Twitter #whatsappdown
— Anshita Singh (@chatterboxanshi) May 4, 2017