This blog is about how a new generation of BOTs impacted our application performance, exploited problems in our deployment and skewed our web analytics. I explain how we dealt with it and what you can learn to protect your own systems. Another positive side-effect of identifying these requests is that we can adjust our web analytic metrics we report to management. Tools like Google Analytics can’t exclude all of these requests as they cannot always be identified by IP Addresses or User Agent alone. As these BOTs are also really clever you need a clever approach to finding them. Here is how we did it.
Best Practice: Only Optimize Valid Requests
Every application owner wants fast application performance. That’s also my goal when it comes to serving content for our APM Community members. But, instead of “blindly” trying to optimize each individual content page, I came up with one best practice that I want to share with you: Only optimize valid requests!
It started when we kept seeing more requests from BOT networks. The tricky thing with BOTs is that they not always identify themselves as BOTs, e.g.: through a specific User Agent String or well-known IP Addresses. Therefore, from time to time, I sit down and analyze slow and erroneous requests and try to figure out how these requests can be optimized. But before I start optimizing I check whether these requests are actually coming from real community users or whether we should better block these requests by a certain newly discovered pattern.
Step 1: Identify Slow Transactions
I like the fact that I get a PurePath for every single web request (not just for slow ones or a handful every couple of minutes) that gets executed against our community. It allows me to play data nerd and dig in in order to identify why certain things are as they are. I start out by looking at all the PurePaths that came in for requests on our PUB (=public space) in a rather busy time period and sort this list by response time:
Step 2: Identify a Request Pattern
The screenshot above already highlights a strange access pattern of some best practice documents we provide on the community portal for download. Why strange? Because these best practice pages are already rather old and we haven’t promoted them in a while. To cut it short: I didn’t expect to see that URL being requested at all – despite the great content it contains 🙂
Now – does this mean people are suddenly interested in this content, or is something else going on? Maybe a BOT attack? The Server Timeline Dashlet – one of my recent favorites when it comes to diagnosing problems – shows me that the certain interest in this content is very localized: China calling!
What’s very interesting with these Bots is that they use a specially prepared URL and special parameters that cause one of our installed plugins on the community to throw hundreds of exceptions. I assume these BOTs just try to exploit well-known problems. The following screenshot gives us more details on these problems and allows us to block these types of parameters in the future. On the other side it is also time to look into better configuration of these plugins to prevent these problems right from the start:
Step 3: Exclude Bad Requests
The measures we take to exclude these bad requests are just “the regular” steps you would do: Excluding certain IP Ranges, User-Agent Strings but also URLs with certain parameters. We use IIS as a frontend web server and also use a URL Rewriting plugin to check bad parameters and URLs before passing it on to the backend Java Application Server (Tomcat).
The benefits of excluding these requests have several positive impacts. The biggest for us is that our web server (IIS) doesn’t get spammed with requests that consume a lot of precious network bandwidth and socket connections. During high peak BOT traffic times we see IIS spending most of its time in simple I/O blocking its worker threads that would be needed for “real” requests. We also frequently hear from our customers that they see quite a substantial time in Web Server I/O time. Sending back the content over a slow network connection is not the only reason for it – but it is one of them. If you think network might be a problematic area for you check out some of our network performance related blog posts.
Step 4: Optimize Good Requests
Now – what do we do with the good requests once we only have good ones on our system? We optimize them. Depending on your application, there are different areas to do so. Over the last few years we have blogged a lot about performance hotspots and problem patterns. I suggest you read up on it by starting here: Top Performance Landmines. This content ranges from content optimization, to bad coding, all the way into the database. For one of these best practices pages I already found several good starting points for optimization. On the one side it’s about time we follow well known Web performance optimization practices as highlighted by tools such as Dynatrace, YSlow or PageSpeed:
Do you have Best Practices to share?
How do you tactically improve performance? Do you have your own best practices, log files or measures you look at on a regular basis? Ping me through this blog, find me on twitter (@grabnerandi) or leave a comment on our Dynatrace Forum in case you are a customer or free trial user. It would be great to get additional best practices that I could share at our upcoming Global Performance User Conference in October: PERFORM 2014.