Having a performance budget for a feature is a widely discussed approach that I’ve been proposing for a performance oriented culture for a long time. In addition to Page Load Time – primarily used to derive the User Experience – I recommend including key quality, performance and scalability metrics to define, implement and approve new features.
If the business requires improving Page Load from 2.0s to 1.8s to optimize user experience we have to find ways to speed it up. The only questions that need to be answered are how, where and at what price? It’s comparable to me driving my car from Linz (my hometown) to Salzburg (where I often travel for Friday night salsa dancing!). I typically complete the trip in one-hour and 10 minutes. That’s my speed/performance. If I have a late start I try to accelerate it by simply driving faster. The question is: Do I want to pay the price of increased gasoline consumption (and the potential speeding ticket)? The question really comes down to: Is it worth the cost to make up for the 10 minutes?
We can improve both! But what’s the cost in terms of additional resources (labor time, increased CPU or bandwidth requirements) needed to accomplish this improvement? Will we actually do more business with our users to justify the costs we generate on the improvements?
Defining a Resource Budget to Supplement the Performance Budget
To address this we need to supplement the performance budget with a resource budget to control how much the X% improvement on speed/performance and usability actually costs from a resource allocation perspective. I recommend adding the following metrics:
- Total Bytes Sent & Received from CDN, from our Cloud and from our Servers. Why? Because we have to pay every byte transferred!
- Number of Database Rows Inserted and Retrieved: Why? Some DBs (especially cloud offerings) may charge you by the number of transactions or by the number of rows.
- Total Bytes Transferred for (Micro)Service Calls: Why? If you run everything in the cloud you need to factor in these “internal transfers” as well.
- Number of Log Lines created and Total Size of Log Files: Why? You have both storage cost as well as Log Analyzer Tool costs (if, for instance, you use Splunk).
Who picks up the tab? You or your users?
You also need determine who “pays” these costs. “The Cloud” is mentioned several times in the previous list, which means you must pay for resources consumed. But you must also consider your actual end users/customers that need to pay for data transfer when they use your services through their mobile devices. Depending on their mobile contract and data plan it could become a costly endeavor to use your app/service. The user should not feel the urge to first check available data volume before using your app. This applies for both Mobile Web and Mobile Apps. For mobile app the cost is not just for downloaded data but also consumed storage on the device. Not everyone is blessed with GBs of free storage. An app filling up storage, making the device slower, is not a good app.
Example: Submitting an Insurance Claim via Mobile App
Let’s walk through a simple example. Assume we are working for an insurance company, and are implementing a new mobile app for submitting a claim after a car accident. The app should be easy to open, log in, create a new claim, and upload images of the damaged vehicle. If this is the main use case for the mobile app we want to define a performance and resource budget for the whole use case — from opening the app to successfully submitting the claim — and the individual steps.
If I am the business analyst for this primary use case, I want to make sure that it takes the least amount of steps to accomplish. I want the “Submit Claim” option already on login screen, and not as one of the many options following the actual login. It’s like going to the ATM of my bank where, on the first screen, there is a default option next to where the PIN is entered saying “Withdraw your default amount”! Eliminating a single step in the process is great for both the performance and resource budget, and will definitely be appreciated by the user.
I also want to make sure that the “Upload Image” step actually compresses the image before uploading it. This will greatly reduce transferred bytes and, therefore, reduce data storage and end-user costs of the upload transfer as part of their monthly mobile data plan. Before submitting the claim the user must also enter additional mandatory information including accident location and damaged parts. Some information will be entered through a drop down menu offering multiple (and maybe dynamic) options per car model. Instead of loading the full list every time the app is opened by the user, it should be cached locally, checked only when there is a new updated list, and download it on demand, all of which can take place in the background while the user is taking photos of their vehicle.
On the server-side implementation, I am interested in how many interactions we have with our backend services when our user navigates through the use case: How much data do we store in the database? How many lines of logs get generated for every action of this user? Let’s keep this to a minimum to optimize our costs of infrastructure (including the costs for log analyzers).
How to Monitor the Budget?
The ultimate goal is to automate the process of monitoring key metrics for our resource budget. In this scenario I assume our developers or testers create a test script to test this use case. When installing Dynatrace Free Trial App Mon and UEM on both the Mobile App and the Backend Services we obtain automatic access to all the metrics previously described:
- Total Transferred Bytes between Device and Our Services
- Requests to 3rd party and CDN Providers
- Number and Type of Database Queries Executed
- Number of Log Messages Written
- Count and Transferred Bytes of Internal Service Calls
You can find these metrics on the captured Visit and the User Action PurePaths captured for every interaction with the app. Here is the screenshot of a Visit captured with Dynatrace. It includes every single interaction with the app and all the metrics captured:
You should start doing sanity checks while the feature is being developed. Review these key metrics with every involved engineer, and make certain everyone is comfortable that this is the most efficient way to implement it. The best time to implement this is in Sprint or even Code Reviews. While demonstrating the new feature also indicate how much of your Resource Budget is consumed!
Now it is time to fully automate these steps. At the end of your Test Run, simply query Dynatrace through the REST Interface about these metrics and pull it into your CI Build Stats. Another step would be to use the Dynatrace Test Automation Feature where we automatically examine these metrics, baseline them across builds and notify you whether one of these metrics indicates a negative change.
Share Your Performance and Resource Experience: How do you do it?
I hope this blog has provided you with some actionable insights on what can be done beyond examining Performance Metrics when implementing new software services and apps. Please let me know if you think I’ve missed any key metrics. Are there any other practices you apply to make sure your “speedy features” are not costing more to run them than they actually earn?