Just as Walt started the engine the car beeped. He realized he might be out of gas by the time he reached his office in the downtown of Ragpo, Glovania.
Luckily he recalled there was a gas station at the main junction just before getting to the speedway. He pulled over and rushed to the dispenser only to find a notice saying:
Sorry, we are out of gas. Waiting for the delivery
Now he had to find another gas station but was not too sure whether he will get there.
A few days before, the manager at that gas station had noticed it was time to order more gasoline; he opened up the application for making orders at GlovaniaGas. But the process kept hanging for quite a while and eventually he got disconnected. He called the help desk and learned that the company was currently experiencing some problems with their oil derivatives order processing running on SAP system.
Think about SAP Application Delivery Chain Performance as a Logistics Chain
One of our clients called GlovaniaGas, the major gas and oil producer in Glovania (name changed for commercial reasons), implemented SAP to standardize and optimize their business process by managing their supply chain, which includes selling gas into the national network of gas stations. But it takes more than a big brand software solution to keep these cars gassed up. GlovaniaGas started experiencing problems when customers (the gas station chains) were complaining about their orders not being fulfilled. The transactions were getting very slow or would even get aborted. As a consequence, the gasoline would not be distributed and trucks will be running with tanks half full, causing logistics and distribution headaches. What was the reason for that? The Operations team at GlovaniaGas had to quickly figure out why so many of their 60,000 SAP users were unhappy while everything looked good within the SAP application.
According to the report published by IBM in August 2012 (Impact of Downtime on SAP Enterprise Supply Chains) the average costs of downtime on an SAP system can range between $535,780 and $838,100 per hour, i.e., almost $15,000 is lost by the company every minute SAP is down (see Figure 1). If we take into account the cascading impact of the downtime the costs may even rise 5-10x.
Ensuring that the myriad of moving pieces, the processes and the people involved in large supply chain, work all together smoothly is not an easy task. That is where ERP systems like SAP can help. It helps to define, formalize and standardize all the vital end-to-end processes that support the company business. But … what happens when things are not going as expected, when users are complaining about getting poor service performance, or worse, when they are stalled and cannot do their job? This has a clear impact on the company overall performance, including revenue generation, employee productivity, manufacturing delays, and logistics problems. And yet, how to tell if it is a problem within the SAP instance or anywhere across the entire service delivery chain, so the appropriate corrective actions can be taken?
How Does This Affect Your Business?
SAP application is like a logistics terminal. It receives orders from customers (request across the network), processes them (calls to application servers, databases etc.), then receives back the goods to be delivered (application response to the request) and sends them back to the customer (response sent over the network).
Performance of this logistics center is important. You should have full control of and visibility into it. From your customer’s point of view what matters, however, is that the business transaction is executed without problems and that it results in timely delivery of goods or services. It can be achieved only when goods to be sent are ready on time and when the shipping and delivery process works flawlessly: in line with inventory availability, packaging specifics, delivery handling needs and so on. The business success can be seriously affected even when providing the finest quality of the goods; your company reputation would suffer when delivery is delayed, address missed or packages damaged.
Similarly, your computer network has to work hand in hand with the application server. You need a holistic view on the whole application delivery chain, including network performance, to ensure that application transactions (the goods) are processed and delivered intact and on time.
The problems, which started after GlovaniaGas decided to run SAP application themselves, had impact on many levels of company operations: starting from the logistics, such as inventory management (both at the oil company and the gas station), oil tracks management and dispatching, through decreased productivity through revenue deferred or lost when the end users turn to the competition in the future.
Seeing the Bigger Picture
SAP is a challenging environment from the application performance management point of view – this is necessarily so, given the complexity of the business process and integration it has to control.
We often hear from our customers, who have SAP monitoring tools deployed at every server, that the Operations team repeatedly assures “it is not the network” problem each time they experience performance incidents that remain without resolution. Business sponsors of SAP implementations perceive IT as being busy with itself and not with helping the business run smoothly to unleash the growth of the company as promised by SAP.
Therefore, a straightforward and holistic approach to performance management works best for SAP. This holistic view starts with understanding the actions of the end user: how, when, the context and the result. We need to be able to tell what the server contribution is, what the network part is, and whether it all works together as it should. In the SAP domain, this translates to understanding all transactions executed, for all users, all the time. In specific:
- The name of the SAP user who performed the transaction.
- The transaction name, i.e., T-Code.
- Performance experienced by the user executing the transaction as it flows through the whole delivery channel: from user’s desktop to data center and back.
When Michael O’Leary became the CEO of Ryanair the company was on the verge of bankruptcy. He turned it into a prosperous business entity survived even in the times of economic downturn when we see other big airlines struggling. How did he manage to do that? He basically dissected the whole business operation into series of single steps and looked at them from the perspective of saving even as little as few cents per client at each step.
We believe that the same applies to monitoring performance and troubleshooting business processes in the company. We cannot only look at one piece of the puzzle but we need a holistic view over all components. Figure 2 shows an overview of the key components of an SAP system.
Achieving this perspective in the SAP environment can be difficult if approached the traditional way, i.e., by looking at the application and network visibility. The integration of many data sources results in a patchwork picture that covers all angles but is disjointed, needlessly complex and ultimately unusable.
To produce a coherent picture we propose a converged approach of a single overarching performance data collection method: a network probe measures transactions along the application delivery chain, i.e., between end users and the SAP server, between the SAP server and the database server, and between Citrix and end users.
The challenge that remains is to enable detailed enough visibility into specific SAP and database transactions. This cannot be achieved by just looking at the network packets and measuring the Response Time on TCP level. Precise reporting on application-specific messages, e.g., SAP TCodes, SQL queries or user names, including encrypted transactions, is mandatory. We need a network probe that goes beyond network flow tracking and into application message decoding.
Finding the Solution
The Operations team at the GlovaniaGas was appointed a very challenging task: they had to figure out what was causing the performance problems resulting in many transactions in the SAP system to run slow or never actually finish.
The problems started roughly after GlovaniaGas moved their whole SAP infrastructure from 3rd party supplier to their own data center. No wonder then that SAP was often associated with the root cause of those problems.
The Operations team started their analysis by looking at the Application Health dashboard (see Figure 3). It shows three applications which report performance problems and indicate data center tiers as a potential cause. Figure 3 additionally shows more details about the selected application where the Operation Time indicates network problems with the Network Time above 13 seconds.
Because the Application Health dashboard indicated that problems could be caused by the data center tiers, the Operations team decided to further investigate the problem using the Data Center Analysis dashboard to get an overview of the status of SAP components. Figure 4 shows the worst-performing modules executed at the SAP tier. The report also lists users that were affected the most by current problems. The team noticed that three users with the worst performance were coming from the same location. Within the selected time range there are two weekend days for which the traffic was minimal.
The Operations team drilled-down through the worst-performing module, i.e., Sales and Distribution, down to the list of operations executed at this module (see Figure 5). For those that were experiencing the worst performance the Operation Time breakdown also indicated the problem with the network instead of the application. Figure 5 highlights one of the under performing TCodes in the Sales and Distribution module.
As a final step, the Operations team took the hint on those three most affected users (see Figure 4) and checked the Location Health dashboard to see which locations were affected. Figure 6 shows that one location does indeed have worse application performance affecting more users than other locations. For this location, the operation team could examine which users were affected the most. The team could also see that the Sales and Distribution application was affecting a lot of users. The Operation Time breakdown indicates again that it was a network problem rather than an application problem.
Eventually the Operations team concluded on the basis of clear evidence that problems with the poor performance were, in contrary to common belief, caused by the network infrastructure: in this case one of sites was affecting the overall end user experience and impacting business operations.
Some would argue that thorough collection of system metrics from SAP infrastructure provides a detailed enough picture of application performance. To use an analogy, this is similar to precisely tracking temperature and humidity at the production facility, plus the electricity consumption and the level of noise produced by the machines. Although all those indicators are important for checking the health of the facility they do not show the overall picture of the production progress, how the supply chain works and what the delivery times are. You need an end-to-end view from the end user perspective to know how your business (performance) is perceived by your customers.
The IT infrastructure, especially SAP services, is vital for many business operations. According to aforementioned report, a company can lose as much as $15,000 per minute when SAP services are down or not operating in at their optimal pace. Ensuring that business operations are not affected requires tools that enable holistic view over complete end-to-end infrastructure. Our client, GlovaniaGas, used Compuware Dynatrace Data Center Real User Monitoring (DCRUM) to complement the SAP Solution Manager with an end user focused fault domain isolation workflow and they quickly zeroed in on the actual root cause of the problem. In their case it was not an SAP related but a network infrastructure problem.
(This article was co-authored with Krzysztof Ziemianowicz, with materials contributed by Rafael Messias and Vincent Geffray based on original customer data. Screens presented may differ in most recent releases of the product while delivering the same value.)