Clouds on Cloud Nine: The Challenge of Managing Hybrid-Cloud Environments

Obviously, cloud computing is not just a fancy trend anymore. Quite a few SaaS offerings are already built on platforms like Windows Azure. Others use Amazon’s EC2 to host their complete infrastructure or at least use it for additional resources to handle peak load or do “number-crunching”. Many also end up with a hybrid approach (run distributed across public and private clouds). Especially hybrid environments make it challenging to manage cost and performance overall and in each of the clouds.

In this blog we discuss the reasons why you may want to move your apps into the cloud, why you may need a hybrid cloud approach or why it might be better to stay on-premise. If you choose a cloud or hybrid-cloud approach the question of managing your apps in these “Silos” comes up. You want to make sure your move to the cloud makes sense in terms of total cost of ownership while ensuring at least the same end user experience as compared to running your applications the way you do today.

Cloud or No Cloud – A question you have to answer

The decision to move to the cloud is not easy and depends on multiple factors: Is it technically feasible? Does it save cost and how can we manage cost and performance? Is our data secure with the cloud provider? Can we run everything in a “single cloud”, do we need a hybrid-cloud approach and how do we integrate our on-premise services?

The question is also which parts of your application landscape benefit from a move into the cloud. For some it makes sense—for some it will not. Another often heard question is the question of trust: We think it is out of question that any cloud data center is physically or logically secured with highest standards. In fact, the cloud data center is potentially more secure than many data centers of small or medium sized enterprises. It boils down to the amount of trust you have in your cloud vendor.

Now, let’s elaborate on the reasons why you may or may not move your applications to the cloud.

Reasons and Options for Pure Cloud

a) If you are a pure Microsoft shop and you have your web applications implemented on ASP.NET using SQL Server as your data repository, Microsoft will talk you into using Windows Azure. You can focus on your application and Microsoft provides the underlying platform with options to scale, optimize performance using CDNs, leverage single-sign-on and other services.

b) If you have a Java or Python application and don’t want to care about the underlying hardware and deployment to app and web servers you may want to go with Google AppEngine.

c) If you have any type of application that runs on a Linux (or also Windows) then there is of course the “oldest” and most experienced player in the cloud computing field: Amazon Web Services. The strength of Amazon is also not necessarily in PaaS (Platform as a Service) but more in IaaS as they make it very easy to spawn new virtual machine instances through EC2.

There is a nice overview that compares these three cloud providers: Choosing from the major PaaS providers. (Remember—the landscape and offerings are constantly changing: so make sure to check with the actual vendors on pricing and services). There are a lot more providers in the more “traditional” IaaS space like Rackspace, GoGrid and others.

Reasons for Staying On-Premise

Not everybody is entitled to use the cloud and sometimes it simply doesn’t make sense to move your applications from your data centers to a cloud provider. Here are three reasons:

a) It might be the case that regulatory issues or the law stop you from utilizing cloud resources. For instance, you are required to isolate and store data physically on premise, e.g. banking industry.

b) You have legacy applications requiring specific hardware or even software (operating system); it can be laborious and thus costly or simply impossible to move into the cloud.

c) It is simply not cheaper to run your applications in the cloud; the cloud provider doesn’t offer all the services you require to run your application on their platform or it would become more complex to manage your applications through the tools provided by the cloud vendor.

Reasons for Hybrid Clouds

We have customers running their applications in both private and public clouds and sometimes even choose different public cloud providers. The common scenarios here are

a) Cover Peak Load: Let’s assume you operate your application within your own data center and deal with seasonal peaks (e.g. four weeks of Christmas business). You might consider additional hardware provisioning in the cloud to cover these peaks. Depending on your used technologies you may end up using multiple different public cloud providers.

b) Data Center Location Constraints: In the gambling industry it’s required by law to have data centers in certain countries in order to offer these online services. In order to avoid building data centers around the globe and taking them down again when local laws change we have seen the practice of using cloud providers in these countries instead of investing a lot of money up-front to build your own data centers. Technically this is not different from choosing a traditional hosting company in that country, but a cloud-based approach provides more flexibility. And here again, it is possible to choose different cloud providers as not every cloud provider has data centers in the countries you need.

c) Improve regional support and market expansion: When companies grow and expand to new markets they also want to serve these new markets with the best quality possible. Therefore it’s common practice to use Cloud Services such as CDNs or even host the application in additional regional data centers of the current cloud providers.

d) Externalize frontend and secure backend: We see this scenario a lot in eCommerce applications where the critical business backend services are kept in the private data center with the frontend application hosted in the public cloud. During business/shopping hours it is easy to add additional resources to cover the additional frontend activity. During off-hours it’s easy and cost-saving to scale down instead of having many servers running idle in your own environment.

A Unified View: The (Performance) Management Challenge in Clouds

Combining your Azure usage reports with your Google Analytics statistics at the end of the month and correlating this with the data collected in your private cloud is tedious job and in most cases won’t give you the answers to the question you have; which potentially are:

a)      How well do we leverage the resources in the cloud(s)?

b)      How much does it cost to run certain applications/business transactions—especially when costs are distributed across clouds?

c)      How can we identify problems our users have with the applications running across clouds?

d)     How can we access data from within the clouds to speed up problem resolution?

Central Data Collection from all Clouds and Applications

We at Dynatrace and also an increasing number of our customers run their systems (Java, .NET and native applications) across Amazon EC2, Microsoft Windows Azure and on private clouds for the reasons mentioned above. In order to answer the questions raised above, we monitor these applications both from an Infrastructure and cloud provider perspective as well as from a transactional perspective. To achieve this we…

  • Use the Amazon API to query information about Instance usage and cost
  • Query the Azure Diagnostics Agent to monitor resource consumption
  • Use Dynatrace UEM to monitor End User Experience
  • Use Dynatrace APM across all deployed applications and all deployed clouds
  • Monitor Business Transactions to map business to performance and cost

The following shows an overview of how central monitoring has to look like. End users get monitored using Dynatrace UEM, the individual instances in the cloud that are monitored using Dynatrace Agents (Java, .NET, native, …) as well as Dynatrace Monitors (Amazon Cost, Azure Diagnostics, …). This data combined with data captured in your on-premise deployment is collected by the Dynatrace Server providing central application performance management:

Getting a unified view of application performance data by monitoring all components in all clouds
Getting a unified view of application performance data by monitoring all components in all clouds

Real-Life Cross Cloud Performance Management

Now let’s have a closer look at the actual benefits we get from having all this data available in a single application performance management solution.

Understand your Cloud Deployment

Following every transaction starting at the end-user all the way through your deployed application makes it possible to a) understand how your application actually works b) how your application is currently deployed in a this very dynamic environment and c) identify performance hotspots:

Follow your End User Transactions across your hybrid clouds environment: identify architectural problems and hotspots
Follow your End User Transactions across your hybrid clouds environment: identify architectural problems and hotspots

Central Cost Control

It is great to get monthly reports from Microsoft but it is better to monitor your costs online up-to-the-minute. The following screenshot shows the dashboard that highlights the number of Amazon Instances we use and the related costs giving us an overview of how many resources we consume right now:

Live monitoring of instances and cost on Amazon
Live monitoring of instances and cost on Amazon

Monitor End User Experience

If you deploy your application across multiple cloud data centers you want to know how the users serviced by individual data centers do. The following screenshot shows us how end user experience is for our users in Europe—they should mainly be serviced by the European data centers of Azure:

Analyze regional User Experience and verify how well your regional cloud data centers service your users
Analyze regional User Experience and verify how well your regional cloud data centers service your users

Root Cause Analysis

In case your end users are frustrated because of bad performance or problems you want to know what these problems are and whether they are application or infrastructure related. Capturing transactional data from within the distributed deployed application allows us to pinpoint problems down to the method level:

Identify which components or methods are your performance hotspots on I/O, CPU, Sync or Wait
Identify which components or methods are your performance hotspots on I/O, CPU, Sync or Wait

For developers it is great to extract individual transactions including contextual information such as exceptions, log messages, web service calls, database statements and the information on the actual hosts (Web Role in Azure, JVM in EC2 or AppServer On-Premise) that executed this transaction:

Dynatrace PurePath works across distributed cloud applications making it easy for developers to identify and fix problems
Dynatrace PurePath works across distributed cloud applications making it easy for developers to identify and fix problems

Especially the information from the underlying hosts—whether virtual in one of your clouds or physical in your data center—allows you to figure out whether a slowdown was really caused by slow application code or infrastructure/cloud provider problem.

For our Dynatrace users

If you want to know more about how to deploy Dynatrace in Windows Azure or Amazon EC2 check the following resources on the Dynatrace Community Portal (requires Dynatrace Community Login): Windows Azure Best Practices, Amazon EC2 Account Cost Monitor, Amazon EC2 FastPack

Andreas Grabner has 20+ years of experience as a software developer, tester and architect and is an advocate for high-performing cloud scale applications. He is a regular contributor to the DevOps community, a frequent speaker at technology conferences and regularly publishes articles on You can follow him on Twitter: @grabnerandi