How often do you deploy new software? Once a month, once a week or every hour? The more often you deploy the smaller your changes will be. That’s good! Why? Because smaller changes tend to be less risky since it’s easier to keep track of what has really changed. For developers, it’s certainly easier to fix something you worked on three days ago than something you wrote last summer. An analogy from a recent conference talk from AutoScout24 is to think about your release like a container ship, and every one of your changes is a container on that ship:
If all you know is that you have a problem in one of our containers you’d have to unpack and check all of them. That doesn’t seem to make sense for a ship, and neither does it for a release. But that’s still what happens quite frequently when a deployment fails and all you get is “it didn’t work.” In contrast, if you were shipping just a couple of containers you would be able to replace your giant, slow-maneuvering vessel with something faster and more agile—and if you’re looking for a problem, you’d only have to inspect a handful of containers. While adopting this practice in the shipping industry would be a rather costly approach, this is exactly what continuous delivery allows us to do: Deploy more often, get faster feedback, and fix problems faster.
A great example is Amazon, who shared their success metrics at Velocity:
However – even small changes can have severe impacts. Examples?
- Memory Leaks in Production: Introduced by a not well tested remote logging framework downloaded on GitHub
- Performance Impact of Exceptions in Ops: Ops and Dev did not follow the same deployment steps (lack of automation scripts) resulting in thousands of exceptions and maxes out CPU on all app servers
Extending your Delivery Pipeline
Even small changes need to be tracked and their impact on overall software quality must be measured along the delivery pipeline, so that your quality gates can stop even the smallest change from causing a huge issue. The 3 examples above could have been avoided when automatically looking at the following measures across the delivery pipeline and stopping the delivery when “architectural” regressions are detected:
- The number of DOM manipulations
- memory usage or object churn rate per transaction
- number of exceptions, number of database queries or number of log entries.
In a series of blog posts I will introduce you to metrics that you have to measure along your pipeline to act as an additional quality measure mechanism in order to prevent problems listed above. It is important that:
- Developers get these measurements in the commit stage
- Automation Engineers need to measure them for the automated unit and integration tests
- Performance Engineers add them to the load testing reports you do in staging
- Operations verify how the real application behaves after a new deployment in production
For each metric I introduce, I’ll explain why it is important to monitor it, which types of problems can be detected and how Developers, Testers and Operations can monitor these metrics.
Metric: # of Requests per End User Action
How many web requests does it take to load your homepage, execute your search or perform another critical function of your application? You can use tools such as Dynatrace AJAX Edition, Firebug, SpeedTracer or network sniffing tools such as Fiddler to figure that out.
But – why should you look at this number? Last year we planned to upgrade the software that powers our community portal with the hope that the latest version of Confluence (which powers our community) which will be much faster (as promised in the release notes) as well as leveraging some of the new interactive features they had. We ran a test before and after the upgrade on our staging environment and looked at metrics such as number of simulated users and the number of requests being executed. The first test showed that about 200 requests were executed per user where each user clicked through 4 main pages of our system:
We had to abort the 2nd load test due to too many errors caused by overloaded servers. The most interesting observation was that the same 4 steps were now taking 400 requests per user – that’s TWICE the number:
How to Measure on Dev Workstations
Developers can look at these metrics on their local workstations. They probably already know tools like Dynatrace AJAX Edition, Firebug or Speed Tracer. The following is a screenshot of one of these tools that highlights the key metrics for a single page – in this case it is the homepage of our community after the upgrade:
Especially web developers should be familiar with the Best Practices around Web Performance Optimization. If they see these measures that should think twice before checking in code.
How to Measure in Continuous Integration
Automation Engineers can use the same tools listed above in combination with automated testing tools such as Selenium, Silk, QuickTest, etc.
The key is to capture these metrics for every test that is executed on every build and automatically identify regressions so that your CI build actually fails in case your number of requests jumps. The following is a screenshot from Dynatrace that automatically captures and analyzes these metrics for you and also alerts in case of a regression.
How to Measure in Load Testing
Performance Engineers use Load Testing Tools which typically provide this metric out of the box. Even though they deliver metrics such as # of Transactions or # of Pages they typically also provide the number of actual web requests. If your tool doesn’t provide that data you can analyze the number of requests on your server-side. Looking at the web server logs is one option but makes it a bit hard to figure out which requests came in through which page load. Application Performance Management solutions typically provide a “Page Context” or “User Action” context that allows assigning requests on your web or app server to an individual real or simulated user.
The following is a screenshot from Dynatrace that provides this data for each individual load test step – making it easy to figure out how many web requests are actually processed in the different stages of the test.
Even more interesting than observing a single load test is to compare two tests with each other, e.g.: two tests executed against two different builds to identify the changes. The following table shows the difference in the number of requests executed between two tests with identical transactions. The current test shows a severe degradation in performance and a much higher number of requests executed for the same steps:
How to Measure in Production
So what does this mean for Operations? As we can’t test every page and possible user interaction it is important to measure the same metric in “the real world.” Having a Real User Monitoring solution in place gives you metrics such as Number of Visitors and Number of User Actions, e.g.: “Loading a Page”, or “Clicking on a Link”. Connecting the User Action with the actual web requests requested by the browser gives you the exact metric we’re looking for. The following screenshot shows what this measure looks like at an unnamed shopping site:
The jump we see at 7:40 can now trigger a rollback, in case response time of end users is also impacted or people start complaining. It can also trigger a faster fix to the problem by providing this information to engineering to see what is different between production and development and get a new version deployed as fast as possible (=Roll Forward).
Make sure to also monitor 3rd parties: a jump in the number of requests can also mean that you switched your ad provider or you have a problem with your CDN configuration. Examples?
- When your CDN doesn’t deliver what it promises: A misconfigured CDN may not deliver your images with the correct browser cache headers or is forwarding too many requests to your data center instead of delivering it from the cache
What Does This Mean for You?
This is the first metric – number of requests per visitor – that is important along the delivery pipeline. Here is the key takeaway for your specific role:
- Performance Engineers: Make sure to test your applications from real browsers and different locations. Only then can you make sure that you are testing all involved components such as 3rd party providers.
- Production and Business: Not every use case can be tested – therefore it is important to have this type of monitoring in place. Report this data to your engineering team so that they learn about the impact of implementation changes. Also make sure to understand the impact of 3rd party components.
The next metrics we are going to look into are server-side architectural metrics such as # of Database Statements, # of Exception or # Log Messages written. Stay tuned and feel free to comment below with your own metrics.