Dynatrace adds Cassandra monitoring via Thrift

In Agent version 1.61 we introduced support for the communication protocol Thrift. Thrift is a cross-platform cross-language communication framework that behaves very much like web services and so therefore automatically show ups in Dynatrace Ruxit monitoring. It’s used by a wide variety projects, most prominent among them, Cassandra. Thus you can now see Cassandra databases as a service, see traffic into Cassandra from your Java application’s… read more

Top Performance Problems discussed at the Hadoop and Cassandra Summits

In the last couple of weeks my colleagues and I attended the Hadoop and Cassandra Summits in the San Francisco Bay Area. It was rewarding to talk to so many experienced Big Data technologists in such a short time frame – thanks to our partners DataStax and Hortonworks for hosting these great events! It was also great to see that performance is becoming an important topic… read more

So What? – Monitoring Hadoop beyond Ganglia

Over the last couple of months I have been talking to more and more customers who are either bringing their Hadoop clusters into production or that have already done so and are now getting serious about operations. This leads to some interesting discussions about how to monitor Hadoop properly and one thing pops up quite often: Do they need anything beyond Ganglia? If yes, what should they do… read more

Speeding up a Pig+HBase MapReduce job by a factor of 15

The other day I ran a Pig script. Nothing fancy; I loaded some data into HBase and then ran a second Pig job to do some aggregations. I knew the data loading would take some time as it was multiple GB of data, but I expected the second aggregation job to run much faster. It ran for over 15 hours and was not done at that time. This… read more

How I Identified a MongoDB Performance Anti Pattern in 5 Minutes

The other day I was looking at a web application that was using MongoDB as its central database. We were analyzing the application for potential performance problems and inside 5 minutes I detected what I must consider to be a MongoDB anti pattern and had a 40% impact on response time. The funny thing: It was a Java best practice that triggered it! Analyzing the Application The first thing I… read more

Is Application Performance a Big Data Roadblock? What Early Adopters Are Learning

Based on the recent hype, we must all be convinced that Big Data is the opportunity of the decade. While business leaders are imagining the potential uses for these massive amounts of information, technical teams are struggling to harness its power. Perhaps the most practical challenge involves application performance – because the application is what the end-user sees and interacts with. Regardless of what a Big Data application is designed… read more

Lessons learned from real world BigData implementations

In the last weeks I visited several Cloud and Big Data conferences. Especially the Big Data Innovation in Boston gained me a lot of insight. Some people only consider the technology side of BigData technologies like Hadoop or Cassandra. The real driver however is a different one. Business analysts discover Big Data technologies as the means to leverage tons of existing data and ask questions about customer… read more

About the Performance of Map Reduce Jobs

One of the big topics in the BigData community is Map/Reduce. There are a lot of good blogs that explain what Map/Reduce does and how it works logically, so I won’t repeat it (look here, here and here for a few). Very few of them however explain the technical flow of things, which I at least need, to understand the performance implications. You can always throw more… read more

Pagination with Cassandra and what we can learn from it

Like everybody else it took me a while to wrap my head around the BigTable concepts in Cassandra. The brain needs some time to accept that a column in Cassandra is really not the same as a column in our beloved RDBMS. After that I wrote the first Web Application and run into a pretty typical problem. I needed to list a large number of results and needed to page… read more