As all of you (hopefully) know are we running load tests for Dynatrace in different stages, including regular regression performance tests on daily builds from the trunk. Looking into the performance of the Web UI – which is part of the load testing scenario, I found the following top contributing Web UI REST calls:
The REST call /e/2/rest/startscreen/data/NETWORK_MEDIUM was identified as a hot spot from multiple perspectives (CPU time, response time, and total response time contribution to the Web UI service). So looking deeper into the service calls response time hotspot view, show multiple interesting results:
Code execution is about 4.5 times longer than response time – meaning there is a lot going on in parallel when executing the request.
Code execution also shows that CPU time is significant lower than Execution time indicating that RUNNABLE threads are waiting on CPU cores. Looking deeper into the stack traces shows that the method createColumnFromCqlRow is a top level contributor with 45% contribution to Execution time.
Drilling further into createColumnFromCqlRow now surprised us: All the time is basically spent within a datastax driver method getBytes! We use this method to retrieve data for Cassandra column keys and columns values that are stored as Binary Objects within Cassandra (for almost all column families we use our own (de-)serialization to store column keys and values – we do this also because of performance reasons). We expected that accessing Binary Object columns is very fast as there is no deserialization needed within the Cassandra datastax driver to retrieve this data. By actually looking into the implementation of getBytes (which first looks up a de-serializer for a specific codec in this case ByteBuffer), we realized that we were wrong.
Surprisingly the look up for the de-serializer (method lookupCodec) took almost all the time! As we only want to retrieve the raw ByteBuffer without any conversion and object churn, we looked in the datastax documentation if there are better alternatives to retrieve a ByteBuffer – and found the method getBytesUnsafe which does the job without looking up for a deserialzer and delivers a direct copy of the ByteBuffer.
We found this bottleneck with Dynatrace in less than 5 minutes and fixed it within a day gaining a substantial performance improvement!