Paradigm Shift in Cost Structure and its Risk in Public Cloud Environments

Today, many decision makers tend to favor public cloud environments over their own data centers. Often, the reason is easier and quicker scaling to cover daily or seasonal peaks and it sounds promising to be the cheaper way. However, it’s just different means of paying the bill, which is at first glance financially attractive. If we have a closer look, it turns out to be quite an issue, especially for financially responsible folks: it becomes more complex and less predictable. Let’s have a look at the paradigm shift and how the current approach in enterprise data centers compares to running your business in the Public Cloud:

The Enterprise Data Center

So far, we are used to larger, in advanced planned infrastructure investments which we made with the expectation to be capable of handling expected load. From a financial perspective, dealing with high initial cost that is paid up-front isn’t quite ideal. Finance folks already came up with approaches like leasing and renting addressing exactly that pain. However, I don’t want to go into further financial details here and rather talk about the other aspect I brought up: the planning process that roughly may look like the following:

  1. Estimate the load on the application for the upcoming period
  2. Determine the peak load at a time
  3. Assess resources (hardware) we need to handle the load
  4. Decide whether or not to buy (or rent, lease, etc.) additional resources (hardware)

What we are actually doing here is capacity planning. More precisely, it is capacity planning of a rather inertial system—it will take quite a while to have the hardware delivered and up and running—and thus, careful planning is key. Still, we are facing all the familiar issues of capacity planning: in case of planning insufficient hardware, we possibly can’t cover the load and might face one or more of the following issues:

  1. Slow application
  2. Unsatisfied users
  3. Loss of revenue

In conclusion, if we weren’t capable of accurately assess our upcoming demand we needed to deal with either costly unutilized resources and over-provisioning or the risk of not being capable of serving possible customers. Financially, we have the risk of making unprofitable investments with the advantage of having little chance of unpredictable occurring cost (see figure below).

Cost structure of an enterprise data center.Cost structure of an enterprise data center.

Going Public Cloud

When looking at public cloud environments like Amazon EC2 or Windows Azure, financial folks will be pleased since cost and benefit are occurring related and within a short time period. However, most likely this isn’t our primer goal since we could have done this years ago by accepting outsourcing (or similar) offers. We rather move to or develop our applications based on such cloud environments to address the consequences of inaccurate planning we discussed above:

  1. We want to eliminate the high cost of over-provisioning, i.e. to buy a lot of infrastructure for covering peak scenarios and growing demand and
  2. we want our application to scale easily in case of (unpredicted) increasing load.

The solution to both of these points is on-demand provisioning which is brought to perfection in public cloud offerings. With this, we turned the data center issues into the following consequences:

  1. Response times and user experience will remain stable (see figure below)
  2. Low chance of revenue loss
  3. Potential increase of operational cost

On-demand provisioningOn-demand provisioning: load is increasing (middle chart), response time increases (top chart) when both CPUs reached their capacity (bottom chart). After adding a third and fourth CPU (bottom chart) we maintain the load (middle chart) with our desired load time (top chart).

While the first two points look promising of having solved the problems, the latter brings a new issue on the table. Since this increase can be completely unplanned and unknown in its extent, we potentially face uncalculated operational cost leading to the same result as in the data center: at the end of the day we do not earn money. In the data center, we had insufficient resources to cover the load and earn the money, and in the cloud we have a higher production cost to cover the load that can eat up the earning.

The Cloud Risk ShiftThe Cloud Risk Shift: We lose money in both cases—in the data center we could not earn the money, in the cloud we have too high production cost.

Therefore, we still have to do some kind of capacity planning when operating our applications in cloud environments—but driven from a new origin: since we scale just in time, we actually don’t need to plan to handle our traffic; we rather plan to become aware of the cost for the upcoming period. (Of course, this is only true to a certain extent—if we potentially need a vast amount of e.g. compute hours because we are expecting e.g. a huge seasonal peak we would file a reservation in advance that still is driven by planning to handle traffic.) However, planning for public cloud environments could look like:

  1. Estimate the load on the application for the upcoming period
  2. Assess resources
  3. Calculate cost for resources

In public clouds, we do not plan to cover load but to estimate cost.In public clouds, we do not plan to cover load but to estimate cost.

And here is the bad news, this is going to be quite complex: public clouds offer a wide range of resources; most of them differ in their cost function. On top of that our application obviously uses different resources to a different extent (resource usage). And to make it even worse, all this depends on how our users interact with our application—which is hardly plan-able.

How cost evolvesHow cost evolves: cost functions of our cloud resources, how our application consumes these resources and how our users interact with our features.

Let’s assume our users search on average ten times (user interaction) before they purchase a product. One search transaction accesses—on average—the storage service 12 times (resource usage) which has a linear cost function. While the cost function will remain stable (as long as we don’t change the cloud resource or the provider doesn’t change the pricing), the resource usage can change easily from release to release because it’s dependent on the search algorithm, bug fixes, caching algorithms, etc.—all usually decided by a single developer. For instance, optimizing the storage structure could lead to one additional storage request which sums up if we have a couple of hundred thousand searches a month. Last, and the hardest to control is the user interaction. Let’s assume a hype on a particular product and all your users just want information and read product reviews without the intention of buying it. We want our application to smartly handle that situation without causing immense cost.


Having that said, it is obvious that we are dealing with a new cost structure in public cloud environments. Whereas we have fixed cost in the data center, we have transaction-based cost in public clouds. In other words, we have something like actual cost per user interaction. Financially, we have eliminated the risk of making unprofitable investments but now have introduced a chance of unpredictable occurring cost.

Also, we can derive new responsibilities for our IT operations from the findings above. In the data center, we need to make sure that our applications are performing within our purchased resources, whereas in the cloud we need to have our applications performing within our planned cost.

The responsibility change of IT operations.The responsibility change of IT operations.

Summary and Outlook

Knowing that cost planning is a tricky matter, it might be the better idea to leave pre-calculating only for getting a rough estimate and facilitate live-monitoring and fast release cycles to actually control cost. We now know that the new cost situation originates from addressing the risks we needed to deal with in our enterprise data centers, and it is driven from the cost function of a resource, the resource usage and the user behavior. Therefore, we strive to monitor the two dynamic factors resource usage and user behavior on a transactional basis. Come back to learn how we can manage the new cost structure with a best in class Application Performance Management and integrated User Experience Management solution.

Additional Resources

Join our live webcasts to learn more about best practices: the next scheduled will be in German on June 21. The next English one will be in August, please stay tuned. Also, go ahead and download the recording of our last webcast.

Daniel is passionate about application performance. He helps organizations around the globe to implement a modern, real user centric monitoring approach. Daniel has more than a decade of experience in software engineering in multiple industries and languages. He enjoys traveling, rare beef and never forgets to bring his camera. Reach him at @d_kaar