Welcome from Velocity 2015 – “Live” from Santa Clara. A lot of folks that are interested in “Building resilient systems at scale” (the motto of this year’s conference) made it to the conference venue.  For those that couldn’t make it I hope you find this blog useful as a little summary on what you have missed. Obviously we – Harald Zeitlhofer (@hzeitlhofer) and Andreas Grabner (@grabnerandi)- can’t be in every session – but we try to do a good job in collecting feedback and posting it here. We will post updates in the beginning of this blog post with a little timestamp.

Wed, 3:30pm: How to convince your CFO to invest in performance

by Colin Bendell (@colinbrendell) from Akamai
Colin started his story with some important messages:
  • 71% of mobile phone users expect websites to load as quickly as on a desktop
  • 57% of disappointed mobile web users will not recommend the site
  • 43% of disappointed mobile web users are unlikely to return to the site
His key bullets how to show why performance matters:
  • create pride: show your users’ experience
  • compare yourself with your competition
  • use film stripes for your demonstration to highlight the timeline
  • tell other businesses success stories
    • the Walmart story: every additional second in response time decreased revenue by 10%
    • Tammy Everts (@tameverts): 2 seconds faster boosted conversions by 66%
Colin used a couple of calculation models to show the result of performance optimization in numbers, one was the Monte-Carlo simulation.
Some notes about revenue modelling:
  • use revenue modelling to show directonality
  • avoid absolute claims about performance
  • consider regression analysis to understand what probability inputs
  • know your user performance distribution first (you are using RUM – right?)
Summary
  • performance correlates to engagement
  • make performance part of your culture
    • understand the user experience and leverage empathy
  • project cash flow savings from performance
    • applies to onprem and saas
  • project revenue impact from performance
And always keep in mind that performance can make users happy but also make business sense!

Wed, 3:30PM: The principles of Microservices by Sam Newman

Sam’s (@samnewman) definition of Microservices: Small Autonomous services that work together.
I started taking notes on his talks but decided to simply point to his book which – if it is just partially as interesting as his talk – is something that people should read in case interested in how to architecture systems leveraging Microservices: Building Microservices on Amazon.

Summary: Here are his main Principles of Microservices which he elaborated in very great detail: Modelled Around Business Domain, Culture of Automation, Hide Implementation Details, Decentralize All The Things, Deploy Independently, Consumer first, Isolate Failure, Highly Observable

Wed, 1:30pm Building an effective observability stack

by Laine Campbell (@lainevcampbell) and Alex Lovell-Troy
In the first part of the tutorial Laine Campbell was talking about how to build an effective observability stack. In order to improve a system you first have to understand it. Operational visibility/observability is essential, following some principles:
  • store business and operations data together
  • store at low resolution for core KPIs
  • support self-service visualization
  • keep your architecture simple and scalable
  • democratize data for all
  • collect and store once
Observability provides a lot of benefits:
  • high trust and transparency
  • continuous deployment
  • engineering velocity
  • happy staff (in all groups)
  • cost-efficiency
Laine pointed out the problems traditional monitoring, where data are collected multiple times, in multiple places and in much too high resolution. Too many dashboards are confusing the operator. Automation of such processes is very hard.
A modern and much better way is to place an agent into your system, collecting both business and system data – only once, and sending the metrics to a central data store.
Why should we measure?
  • support our KPIs
  • pre-empt incidents
  • diagnose problems
  • altering when customers feels the pain
What should we measure?
  • velocity: how fast can we push new features?
  • efficiency: host cost-efficient is our system? how elastic is our environment?
  • security: how successful are our penetration tests?
  • performance: how performant is our system? what’s our AppDex?
  • availability: how available is our system? and how about each component? actually here we are back to performance…
Once we’ve collected these metrics, they can be perfectly used to support diagnostics by proper alerting after identifying anomalies in latency or utilisation, as well as identifying dangerous trends.
We can’t measure everything, but measure as much as possible! Alert as little as possible, but as much as required! Consider your transactions from an end-to-end perspective. When measuring keep in mind to know your limitations to not crash your system.
Laine explained that an effective observation stack is built of different components:
  • sensing
  • collecting (with agents or agentless)
  • analysis
  • storage
  • alerting
  • visualisation
In the second part of the tutorial Alex Lovell-Troy introduced their own monitoring stack based on sensu (agent), rabbitMQ (messaging), carbon (telemetry storage) and elasticsearch (event storage), as well as different technologies for visualisation (grafana, kibana, uchiwa). It’s an open source project (https://github.com/pythian/opsviz) and they encourage everyone to join the project!
Summary: a great introduction to the importance of application performance management with aspects of efficient monitoring, centralised logging, as well as proper alerting and visualisation.

Wed 1:30PM: Metrics, metrics everywhere by Tammy Everts and Cliff Crocker

Velocity 2015 - Everts & CrockerIn case you are not yet following these two performance experts you should: Tammy Everts (@tameverts) and Cliff Crocker (@cliffcrocker). The are starting by laying the ground work by telling us that: There is NO SINGLE Unicorn-Performance Metric – but I am sure they tell us which metrics are important and which tools to use 😉

Here are the top bullet points that I took away from their presentation:

  • Who cares about performance? “47% of consumers expect a web page to load in 2s or less” based on Akamai 2009
  • Different persona’s have a different perspective on what performance is and what matters most: C-Level (1s = $27M) vs Operations (DNS = 144ms) vs Developer (Start render = 1.59s) vs Designer (Hero render image = 2.0s)
  • Basic Explanation on RUM (=Real User Monitoring) vs. Synthetic
  • RUM Metrics: performance by end user type/browser, 3rd party impact, conversion & bounce rates, bounce rates
  • Benefits of Synthetic: Baselining, Comparisons across Builds & Competition, More Diagnostics Data, Nothing to Install
  • Mentioning tools services such as https://speedcurve.com/ for SPOF-o-Matic for synthetic checks
  • W3C Navigation & Resource Timings explained and actual JavaScript code examples to get things such as DNS, Rendering, Load Time, Slowest Resources, … check out
  • Interesting cases studies on Conversion vs Performance, e.g: Staples: 10% increase in conversion by shaving off 1s in median load time
  • Different types of retailers (General Products vs Specialty Goods) show different bounce and conversion rates
  • Blogs on Performance Budget Metrics and Setting a Performance Budget.

Summary: Nice intro for folks that are new to Metrics around WPO (Web Performance Optimization) as well as some interesting use cases around W3C Timings and how the different personas (Ops, Dev, Designer, Biz) can benefit from it.

Wed 11AM: WebPerf Best Practices on SPA by Chris Love

Chris Love at Velocity 2015Chris Love (@chrislove) walked us through his experience on one of his SPA (Single Page Application) projects consisting of 400! different views. Showing the current version of the app to a prospect he met at the ball park via a slow “Sprint 3G” connection taught him that you “Can’t Load Everything At Once” as well as “Mobile Matters” -> we need to rethink and re-architecture that!

Get his slides from slideshare!

Here are some of his Best Practices:

  • Test Over Sprint 3G 🙂 Well – start with the Hotel WiFi or In-Flight WiFi to experience performance over a flaky connection
  • Test on Mobile Browsers as they have many limitations such as “aggressively purge cache”, less memory, weaker process
  • Developers need to test their code “outside local workstation”: let them feel the pain!
  • Check http://whatdoesmysitecost.com/ to see how “expensive” your site is for your end users
  • “Loose Weight” as “The Web is Obese”. Check http://httpArchive.org for current stats on how our websites are growing
  • Can’t Use Fast Food Frameworks” – frameworks that grew very large/obese over time, e.g: he is crashing his Mobile Safari after browsing 10 minutes on Facebook due to memory leaks in one of the used JavaScript libraries
  • “Dump jQuery” & What it means: Faster Load Times, There are Alternatives, Learn to be Modular, …
  • Use “MicroJS Libraries”: Small, Single Focus, Promote Modular Architecture
  • DO NOT USE Polyfils: Do Not Code to Legacy Browsers! Detect Features Available and leverage them!
  • Responsive Images: Provide different size images for different device types!

Architectural Goal is to get a compliment from a user saying: “This feels like a regular/rich client application”. Key pillars are Performance, Modularity, Small Footprint, Scalability, Maintainability, Long Lifespan

Optimize Initial Request: Come up with a good Master Layout; inline critical CSS; defer load requests not needed during startup; Keep the DOM Lean; Leverage Browser Storage to Persist Markup; Use a View Engine keeping it below 14kB; View Engine is lazy loading additional content;

Summary: Lots of useful best practices including well known ones. If you want do get started with SPA check his book on Amazon!

Wed 9AM: Service Workers Session by Pat Meenan

patmeenanPat Meenan (@patmeenan) kicked off Velocity with a tutorial on Service Workers. Pat works on the Google Chrome team and – with all of his performance experience from WebPageTest knows best what to look forward when it comes to implementing new features like this with performance in mind. He pointed out that in case you are really interested in Service Workers feel free to reach out to “The Real Experts” as he put it 🙂 – that would be Alex Russel (@slightlylate) and Jake Archibald (@jaffathecake).

Also – check out Pat’s slides on slideshare as I am not going to copy/paste the stuff from his slides!

Service Workers were new for me. They are a vehicle for you to intercept (“Man in the middle”) any type of request being made by the browser before the browser downloads these resources (HTML, JS, CSS, ..). This allows you to modify requests or even provide your own version of resources instead of the browser downloading it from the server. There are use cases for both “online” vs “offline” applications. Service Workers can leverage services such as a Cache to cache elements to provide them in case your users are working offline. A common question was whether this is the regular Browser Cache or a separate Cache. It is in fact a separate Service Worker Cache that you fully control and have to maintain, e.g: you need to take care of removing elements or keeping them up-to-date. The Service Worker Cache also doesn’t fall under any size limitations of the Browser Cache.

Some additional use cases Pat was excited about besides “Offline” Apps:Custom Error Pages, SPOF (Single Point of Failure) Prevention, CDN/Origin Failover, Multi-Origin/CDN Optimizations … -> check out the full list in his slides!

A cool use case is pre-fetching resources. An HTTP response header might contain information about suggested resources. As you have access to all the headers in the Service Worker, you can use that suggestions to request the resources from the server in an idle state, even before the website requests them.

Another is drawing images locally. The example he brought was creating a large waterfall image. The core data is probably a couple of kB but the generated PNG takes excessive resources on the server to generate. Something you can “offload” to the browser by leveraging Service Workers.

Monitoring: Very interesting for us as working for an APM company. Leveraging Service Workers to report back metrics to your monitoring solution!

Limitations: Service Workers currently don’t work well with the Resource and Navigation Timing API. The comment from Pat was: They break it right now but the team is working on it -> good luck for monitoring tools solely relying on these timings 🙂

Before you get started – check out which browsers actually support what type of support: Is ServiceWorker ready?