We’re thrilled to announce the Early Access Program for Dynatrace OpenStack integration! This blog post is the first in a two-part series that explores how Dynatrace supports the monitoring of OpenStack environments.
OpenStack has become quite popular in recent years. Organizations are increasingly opting to build public and private OpenStack clouds for their employees and customers. One reason for the rapid adoption of OpenStack is its vibrant user community, which has fueled OpenStack’s growth and spirit of innovation. By joining the OpenStack community you can contribute your ideas related to requirements definition as well as development. This gives you the power to actively shape the features of the next OpenStack release.
OpenStack is indeed powerful, but it’s also complex. As an OpenStack admin, you know perfectly well that there’s no such thing as a flawless OpenStack cloud deployment. Even more challenging is maintaining smooth operation once your OpenStack cloud is used in a production environment.
Troubleshooting performance issues
Regardless if you’re working with a public or private cloud, as an OpenStack administrator, you need to be able to contend with a range of challenges. The components that are most likely to present you with challenges are:
- OpenStack services
- Supporting technologies like HAproxy, RabbitMQ, and MySQL
OpenStack troubleshooting can be complex and time-consuming. This is due to the elusive nature of many OpenStack issues—problems with one OpenStack service can manifest themselves as performance issues within other services. For example, when a user reports an issue with launching a new VM or attaching a Cinder volume, your first thought might be to look into the log files of your Nova and Cinder services. After combing through hundreds of megabytes of log data, you might learn however that the root cause of the issue resides within a different OpenStack service or supporting technology (for example, HAproxy, Rabbit MQ, MySQL).
Dynatrace has good news for you OpenStack admins out there. With Dynatrace OpenStack monitoring, you no longer need to spend hours troubleshooting elusive issues within your OpenStack cloud!
Dynatrace provides complete OpenStack monitoring
In contrast to conventional monitoring tools, which typically cover only a single monitoring domain, Dynatrace provides a complete monitoring solution. Dynatrace monitoring covers:
- OpenStack services
- Supporting technologies
- Compute nodes and VMs
- Log analysis
For each of these components, Dynatrace provides automated root-cause analysis to help you identify the sources of problems and resolve issues in a timely manner.
Analyze OpenStack performance
OpenStack pages provide a holistic overview of your entire OpenStack account (see example images below).
(1) See if key components like compute and controller nodes are healthy.
(2) Gain insight into environment dynamics by tracking how the number of running virtual machines evolves over time. An increasing trend may indicate the need for capacity adjustments. Crucial details regarding the number of VMs that have been spawned and their average launch times is also included. If you notice launch times going up, you may want to investigate the reasons why.
(3) The Events section provides details such as on which compute node each VM is launched and stopped.
(4) The Compute section shows you how well your compute nodes are performing, which virtual machines are currently running on those nodes, and how the VMs contribute to overall resource usage.
You can slice and dice your OpenStack monitoring data with filters—compute nodes and virtual machines can be filtered based on Region, Security group name, Compute node name, Availability zone, and more. Such filtering is particularly useful for tracking down elusive performance issues within large environments.
Smartscape analysis (see below) shows you how your VMs interact with one another and gives you an understanding of the vertical dependencies between your application components—virtual machines, processes, and services.
Performance analysis of OpenStack services
Let’s explore Dynatrace’s automated problem detection and root-cause analysis capabilities with a Keystone use case. In the example below, the Keystone service began to respond slowly to TCP requests due to memory saturation on one of the controller nodes. Dynatrace has automatically identified the underlying root cause of this issue and the impact of the problem.
Let’s drill down into the Keystone metrics to better understand what’s going on here. Click the Keystone process tile to analyze this process within the context of the detected performance problem.
Here on the Keystone process page we see that the response time of the Keystone service has increased significantly, from 200 ms to 2 s.
By clicking the View all log entries button, you can explore all of the log data that’s been generated by this process.
The Log viewer has uncovered numerous warnings within the Keystone.log file indicating that the authentication process has been failing.
Now let’s take a look at the controller node that caused the issue. As you can see below, memory was indeed exhausted; it reached almost 100% saturation.
Note further down in the Processes section that all OpenStack services running on the controller are listed. Click any of these individual processes to analyze their connections and understand their relationship to other processes.
Dynatrace reports an outage event when Keystone becomes completely unavailable (see below). Outages are a major concern because they prevent users from performing any operations (each API request requires a Keystone token).
Out-of-the-box, Dynatrace automatically monitors your OpenStack environment for a wide range of potential log-based problem patterns. For example, Dynatrace detects when an OpenStack service can’t connect to a database or fails to authenticate.
Monitoring supporting technologies
Another potential problem area that OpenStack admins need to keep an eye on is the technologies that are frequently deployed alongside OpenStack. This includes load balancers (e.g., HAproxy), message brokers (e.g., RabbitMQ), and databases (e.g., MySQL).
To illustrate the challenges involved in monitoring the technologies that support OpenStack, here’s a problem we ran into within our own OpenStack environment. The RabbitMQ process in the example below was launched using the default file descriptor limit of 1024. Once this limit was exceeded, RabbitMQ stopped accepting new connections. This resulted in a Connectivity problem.
We wouldn’t have known about this problem if it weren’t for the RabbitMQ-specific counters that Dynatrace provides. All of this detail is included in the same view, so you don’t need to use multiple tools to get the full picture.
OpenStack dashboard tiles
Dynatrace provides two different OpenStack tiles that you can add to your home dashboard.
The Regions tile displays relevant information related to the health of compute nodes and virtual machines, as well as OpenStack services such as Keystone, Glance, Nova, and more.
The Project tile provides insights into resource usage, taking assigned quotas into consideration. This information enables you to think proactively about resource usage related to critical projects, providing you with early warning of any resource capacity issues that may present themselves.
To add an OpenStack tile to your home dashboard
- Click the Home dashboard button in the upper-left corner.
- Click the Browse (…) button in the upper-right corner.
- Click Add tile.
- Select the Infrastructure filter in the left-hand navigation menu.
- Select the All regions tile or the Project tile.
Stay tuned for part two of this blog post series, to be published shortly. Part two will cover full-stack monitoring of applications that run in OpenStack clouds.