This article is the second part of our OpenStack monitoring series. Part 1 explores the state of OpenStack, and some of its key terms. In this post we will take a closer look at what your options are in case you want to set up a monitoring tool for OpenStack.
The OpenStack monitoring space: Monasca, Ceilometer, Zabbix, Elastic Stack (ELK Stack) – and what they lack
Monasca is the OpenStack Community’s in-house project for monitoring OpenStack. Defined as “monitoring-as-a-service”, Monasca is a multi-tenant, highly scalable, fault-tolerant open source monitoring tool. It works with an agent and it’s also easily extendable with plugins. After installing it on the node, users have to define what should be measured, what statistics should be collected, what should trigger an alarm, and how they want to be notified. Once set, Monasca shows metrics like disk usage, CPU usage, network errors, ZooKeeper average latency, and VM CPU usage.
Even though it’s a bit far-fetched to say that Ceilometer is an OpenStack monitoring solution, I decided to put it in this list because many people refer to it as a monitoring tool. The reality is, Ceilometer is the telemetry project of the OpenStack Community, aiming to measure and collect infrastructure metrics such as CPU, network, and storage utilization. It is a data collection service designed for gathering usage data on objects managed by OpenStack, which are then transformed into metrics that can be retrieved by external applications via APIs. Also, Ceilometer is often used for billing based on consumption.
Zabbix is an enterprise open source monitoring software for networks and applications. It’s best suited to monitor the health of servers, network devices, and storage devices, but it doesn’t collect highly granular or deep metrics. Once installed and configured, Zabbix provides availability and performance metrics of hypervisors, service endpoints, and OpenStack nodes.
Elastic Stack (ELK Stack)
Perhaps the most widely used open source monitoring tool which also works well with OpenStack is the Elastic Stack (aka ELK). It consists of three separate projects – Elasticsearch, Logstash, and Kibana – and is driven by the open source vendor Elastic.
The Elastic philosophy is easy: it couples good search capabilities with good visualization, which results in outstanding analytics. Their open source analytics tool – which is now rivaling with big players like Microsoft, Oracle or Splunk – supports OpenStack too.
Monitoring OpenStack with the ELK Stack starts by installing and configuring the its log collector tool, Logstash. Logstash is the server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to Elasticsearch for indexing. Once installed and configured, Logstash starts to retrieve logs through the OpenStack API.
Through the API, you get good insights into OpenStack Nova, the component responsible for provisioning and managing the virtual machines. From Nova, you get the hypervisor metrics, which give an overview of the available capacities for both computation and storage. Nova server metrics provide information on the virtual machines’ performance. Tenant metrics can be useful in identifying the need for change with quotas in line with resource allocation trends. Logstash also monitors and logs RabbitMQ performance.
Finally, you want to visualize all the collected OpenStack performance metrics. Kibana is a browser-based interface that allows you to build graphical visualizations of the log data based on Elasticsearch queries. It allows you to slice and dice your data and create bar, line or pie charts and maps on top of large volumes of data.
ELK Stack VS Dynatrace: Key differences to know
Monitoring OpenStack is not an easy task. Getting a clear overview of the complex application ecosystem built on OpenStack is even more difficult. Even though they provide good visibility into different OpenStack components and use cases, open-source tools like the ELK Stack clearly have several disadvantages:
- They are unable to see the causation of events
- They fail at understanding data in context
- They rely heavily on manual configuration
Because they are missing the big picture, companies often implement different monitoring tools for different silos. However, they quickly realize that with dozens of tools they are unable to identify the root cause of a performance issue. In these circumstances, how could they reduce MTTR and downtime? And with a number of separate tools, how could they ever see performance trends or predict capacity needs?
By using different monitoring tools for different use cases, companies miss out exactly on the monitoring skills today’s complex business applications require:
- Full stack power, or to see the big picture
- AI-power, to understand data in context
- Automation power, to do this without any manual intervention
Okay, so how is all of this possible with OpenStack? Is there any intelligent OpenStack monitoring tool? In the next part we investigate this by focusing on the Dynatrace way of monitoring OpenStack. Stay tuned!