No eCommerce platform can be operated without a proper monitoring solution in place. In fact monitoring or analytics alone isn’t enough. If you are serious about your online business you need to do all of it: continuous performance management, user analytics, marketing analysis, business monitoring and customer service management. I like to call it:
With Dynatrace AppMon we can collect a variety of data, much more than simply monitoring data like CPU and memory metrics or network usage. Application performance management delivers insight into an application itself, down to the execution of individual methods and database statements. We also gain visibility into third-party services that our back-end applications that we cannot control but depend on.
On the other end of the spectrum APM allows us to gain insight into the end-user’s experience. This allows us to also focus on user experience management, knowing exactly (not guessing) actual users are doing while they are browsing our eCommerce offering. Today we can obtain a complete end-to-end view for a single user and all their steps while navigating our site. We can identify exactly which database statements were executed and which payment services were called when a user clicked on “order now!”. We can even extract the value of orders or if a cart was abandoned and why.
But with all that data, the domains begin to blur: is it user analytics, behavioral tracking, business analysis, performance management, or monitoring?
One tool for everything?
Does it make sense to use one tool for all of this? Can one APM tool satisfy all the needs of eCommerce Monitoring? In my opinion: yes, it can. In my work I’ve been in situations where different stakeholders were arguing over whether metrics captured by one tool are the same or different from those captured with another tool. Common terminology was missing and everyone thought “their tool” was correct and the only source of truth. So it became crucial to align tools or just use one tool for all to satisfy all stakeholder needs.
For example, this dashboard built for SAP Hybris‘ cloud service monitoring shows metrics of all different domains, all based on data Dynatrace collects: server side response time (operations), end-user response time and user satisfaction (analytics) and orders and revenue (business metrics).
Building this type of dashboard on a single data source – Dynatrace AppMon & UEM – eliminates the data and communication gap many organisations have right now, where business relies on Google Analytics or Omniture and IT on APM. In this article I want to show how to build these kind of dashboards using the SAP Hybris eCommerce platform, the Hybris FastPack for Dynatrace, and AppMon’s capability to stream live data to external systems. (The considerate Dynatrace user might notice that this is not an AppMon dashboard, neither web nor client. You are right, but read on – you will be surprised!).
That’s a lot of data!
Over the years Dynatrace evolved from a tech-savvy development and testing solution to a production APM solution. With introduction of sophisticated user analytics and an intuitive web interface it has become a tool for both developers and the casual user. Dynatrace evolved into a full Digital Performance Management (DPM) solution.
With all the comprehensive data Dynatrace AppMon and User Experience Management collects it has also become the “one true source” of data. Dynatrace-generated data is now accessible in different ways for different audiences. Yet the common terminology and, most importantly, the common dataset and metrics, allow a more efficient communication between users with different backgrounds (I’ve mentioned the problem of miscommunication in a previous post).
Leveraging Dynatrace data for different audiences
Back to topic and some hands on! Here is an example how to apply this “one-true-source” principle in a real world environment. We will see how to use Dynatrace’s realtime streaming capabilities to feed live data into a time-series database for external visualisation. This is based on a project I’m currently working on with my friends at SAP Hybris, who are using this to provide powerful monitoring and business dashboards to their end-customers.
Hybris decided to build a monitoring solution that will provide their users insight into their hosted eCommerce solution. This monitoring solution will go beyond just system metrics and provide APM and business metrics.
Instead of manual periodic or ad-hoc reporting on different aspects of a customer’s environments, Hybris was looking for an easy-to-access self-service solution that customers can use to obtain the status of their hosted service. This solution will be able to provide insight into different aspects:
- infrastructure metrics
- application insight and metrics
- end-user analytics
- business analysis
As every customer’s environment is connected and monitored by a central Dynatrace AppMon environment, 3/4 of these metrics are already collected by Dynatrace. Infrastructure metrics are collected by other tools in the datacenter.
Building collection & visualization stack
For visualization Hybris chose Grafana as a flexible and powerful dashboarding solution. Besides the flexibility in building dashboards, the advantage of Grafana is the ability to connect to different datasources like ElasticSearch, Graphite or InfluxDB.
Because most of the collected data are time-series metrics, it makes sense to put them in a data-store that is designed for that purpose. So the choice was InfluxDB, the high-performance and high-availability database component of the Tick Stack by influxdata.
Feeding data into InfluxDB is done via an intermediate component. For that purpose we are using Logstash, part of the Elastic stack. Logstash is a powerful open source data collection, enrichment, and transportation pipeline, and it allows us to transform Dynatrace data on the fly. Also, we can easily add other data sources in the future without touching the storage or visualization layer.
To connect the whole stack we need to feed data from Dynatrace to Logstash. There are two ways to feed live data from the Dynatrace server to external systems: the PureLytics stream, which contains End-User Data (Visits and User Actions) and the Business Transaction Feed, which is a more generic data feed for backend data collected by Dynatrace. For now I’ve chosen to use only the Business Transaction Feed, because it provides more flexibility for Hybris specific data that we are already capturing (e.g. transactions for orders, background jobs, page category performance data etc).
Live-streaming Dynatrace data to Logstash
Dynatrace’s business transaction feed uses Google’s protocol buffers to export data, so to make use of that data we need a protobuf interpreter on the Logstash side.
At this point I need to give a short introduction to Logstash: Logstash uses input plugins for receiving data from different sources (Dynatrace) and output plugins to write data to different destinations (InfluxDB), in between it uses filters for operations and modifications of processed data.
So the first step is to configure Logstash’s input plugin so that it can receive protobuf messages from Dynatrace. From the Dynatrace Client we can get the protobuf definition of Dynatrace’s export format. This definition can then be used to compile (using ruby-protoc) a interpreter which we will use in our Logstash input plugin.
We can now configure Dynatrace to send data to logstash by pointing the business transaction livefeed to our Logstash host that is listening on port 8080.
To check if the connection works we can simply add a output plugin configuration to Logstash that writes the received message in JSON format to a file:
For performance reasons the Dynatrace server sends business transaction data in bulk messages, grouping multiple occurrences into one message. Unfortunately, this means that we need to do some processing on the logstash side to split the message into individual data events before we write it to InfluxDB. This is where logstash’s filter chain comes in handy. The final logstash configuration I’m using is actually doing a few more things:
- Split the bulk message into separate individual message events, one for each business transaction occurrence
- Each individual message event is fed into logstash again (looped back). This is a common way for high load scenarios where multiple instances of logstash are being used (pre-filter/analysis and processing) often in combination with message queues.
- Every business transaction event message is passed through a set of filter logic that modifies fields, removes data and adds additional fields (e.g. calculating a geohash tag on the fly). Also we take care of dynamically setting “tags”from business transaction splitting measures. In InfluxDB tags are used for non numeric values, ideal for Dynatrace’s splitting values. This dynamic tagging is required so that we can simply change/add new business transactions in Dynatrace without worrying about the Logstash or InfluxDB configuration.
Sending data from Logstash to InfluxDB
Logstash comes with an output plugin for InfluxDB, so once the message has passed through the filter it’s easy to store the data. As InfluxDB doesn’t require any schema definition, we can simply create measurements for every business transaction we export from Dynatrace on the fly. So the output plugin configuration contains a variable target measurement setting that is determined by the currently processed message.
Now that Logstash writes to InfluxDB, we will see measurements being created in InfluxDB as soon as there is data processed in Dynatrace. For every Business Transaction that has the “Export results via HTTP” option set a new measurement is created, result measures of that business transactions are added as numeric fields and splittings of the business transaction are added as tags.
Querying one measurement you will see some default tags plus all the splitting that were defined in the Business Transaction in Dynatrace.
Note that InfluxDB stores every single occurrence, it doesn’t aggregate or condense the data over time like Dynatrace would do in it’s own performance warehouse. InfluxDB is built for that purpose!
Visualising data in Grafana
Now the visualisation layer of the stack is the most fun. Setting up Grafana is easy. Besides InfluxDB we could also use ElasticSearch and other datasources. Grafana even supports multiple datasources in a mix. Since most of the data we feed from Dynatrace is time-series based, InfluxDB is just fine and very performant for Grafana.
Dashboards are easily created with wizard-like support for building queries to InfluxDB.
At this point one might argue “Why not simply use the Dynatrace web dashboards instead of this whole stack and Grafana?” That’s a valid statement, but there are a few additional considerations we had to take into account for this project:
- Separation of the presentation layer, due to security requirements.
- The independent processing and storage layer (Influx/Logstash) allows integration of almost any other datasource and use the same presentation layer.
- Keeping just the data in InfluxDB that we want to present is very efficient as InfluxDB is optimised for time based queries and aggregations.
- Very high granularity of time-series and high performance queries.
Below are three dashboards I created, each with special features that are nice to use but may not be obvious at first glance.
Dashboard 1: True Rate Measures & Real-Time Aggregation
First, you can filter all the data on the dashboard by customisable variables. Grafana calls this templating. You can think of that like a dashboard filter in Dynatrace but a bit more flexible. In the top row you see a few metrics (averages and 90th percentile). These are actually not calculated by Dynatrace but done by InfluxDB – it’s optimised for that!
You will notice a “Page Impressions” graph. There is another advantage behind that: a real rate measure. You might have noticed that due to the way how Dynatrace ages data and uses a chart resolution, there is currently no way to create a real rate chart like PI/s. When you switch to a larger timeframe the resolution is adopted and you will see PI/m, PI/30s, PI/5m. That’s not necessary with InfluxDB and Grafana.
Dashboard 2: eCommerce Metrics & On-the-Fly Math Operations
The second example shows a few important eCommerce metrics: converted carts and abandoned carts (and their potential value). Now this is something that you can also do in a Dynatrace dashboard. However, you’d need two business transaction configuration for that to filter either the converted or the non converted ones. Here we just exported one, containing both, converted or not, and just use a query filter.
If you look at the chart in the center you will notice that the “abandoned cart value” is actually negative. This is just a nice visualisation but it uses a simple trick: mathematical operations on the data returned by InfluxDB, we just multiply times -1 to get that nice effect of the potential or lost revenue below zero or “under the hood”.
Dashboard 3: Flexible templating for context information
This one is my favorite! It’s a dashboard that is meant to be interactive, like a visit search in Dynatrace. You will notice that there are two filters (OrderID, User). These filters are applied to multiple queries and measurements that populate the tables/charts. These are backed by three Business Transactions “Orders by OrderID”, “Visit by OrderID and User” plus – and this is special – a BT for “Background Processing by OrderID” of orders.
If you ever created a “Measure Explosion” in Dynatrace you know that it’s not advisable to create a business transaction configuration that splits by an potential infinite number of options (like a OrderID). However, if you create that business transaction and do not store it, but just feed it to InfluxDB, these infinite number of options are just tags that we can use for searching and filtering, and they have no impact on performance.
Plus, by splitting different business transactions by the same OrderID we can “link them” on this Dashboard. So when the user enters the OrderID they will immediately see when the user created the order, and from where he came. Also, we can track the processing of the order and it’s status in the background, maybe long after the user visit has been completed.
Advanced eCommerce Monitoring
Of course there is much more you can do! The intent of this article is to show that one Datasource (Dynatrace) can provide all the data required to do extensive eCommerce monitoring that benefits different use cases. The casual user might only look at the dashboards in Grafana, while the DevOps person might use the same data in Dynatrace, but can also enjoy having the ability to drill down into the code if required. At the same time, customer service can easily tell if a customer’s order is processing or if it is delayed somewhere in Hybris’ business process engine. Finally, there may be users only interested in eCommerce metrics like revenue, orders and conversion rate. The benefit of one single datasource is clear: there is no confusion about metrics and naming e.g. if a page impression is truly the same in one system as in the other.
If you like my approach of eCommerce management and monitoring, have an idea to contribute, leave a comment, I’m happy to explore!
If you want to learn more about Dynatrace for SAP Hybris eCommerce visit dynatrace.com or immediately start by exploring yourself with Dynatrace for Hybris and a free trial license, copy the above configuration for Logstash and you should soon be working with your own dashboards in Grafana.