A running application is represented through some state stored in memory and calculations carried out by the CPU. Typically performance problems materialize in these two areas. In my last post I’ve explained garbage collection and how to hunt down memory problems in Node.js.
In this blog post I’ll cover the CPU aspects of Node.js including the event loop and how to track down tasks that consume too much CPU time and slow down the whole process.
How “single threaded” is Node.js really?
The famous statement ‘Node.js runs in a single thread’ is only partly true. Actually only your ‘userland’ code runs in one thread. Starting a simple node application and looking at the processes reveals that Node.js in fact spins up a number of threads.
This is because Node.js maintains a thread pool to delegate synchronous tasks to, while Google V8 creates its own threads for tasks like garbage collection.
Finally understanding async I/O & Node.js event loop
Node.js uses libuv which provides the aforementioned thread pool and an event loop to achieve asynchronous I/O. Many articles have been written trying to explain the event loop and still it took me some time to hopefully really understand what this is all about.
Let’s look at a code sample that may be part of a web application and analyze what happens from a threading standpoint while it gets executed:
The main thread first executes the code and calls the readfile() function.
Reading a file is a synchronous input output task and can take its time. In synchronous programming the thread running readfile() would now pause and wait until the system call returns the contents of the file. That’s why for client-server environments (like web applications), most platforms will create one thread per request. A php application within Apache using mod_php works exactly like that.
Our “single threaded” Node.js works differently. To not block the main thread, it delegates the readfile() task to libuv. Libuv will then push the callback function to a queue and use some rules to determine how the task will be executed. In some cases it will now use the thread pool to load off the work but it may for instance also employ environment specific mechanisms that allow creating asynchronous tasks to be handled directly by the operating system.
The event loop continuously runs over the callback queue and will execute a callback if the associated task has been finished. The execution of the callback – and this is very important – will again run on the main thread.
A running Node server can be seen as a cascade of callbacks with the function to run for every incoming request being the root element.
The code example above will work fine and will log the file contents to the console. But what if someone decides to perform a CPU consuming operation like calculating some prime numbers inside the callback. This will obviously block the main thread, prevent new requests from being processed and cause the application to slow down.
Using CPU sampling to trace down CPU-bound operations
The code above will run the CPU profiler for 5 seconds every 30 seconds and store the sampling data under a timestamp based filename. Opening such a file reveals a JSON tree structure and there are several ways to analyze it.
Chrome Developer Tools
For this example, I’ve used an express route that calculates some prime numbers. This is a CPU intensive task and by sorting the call tree by ‘heaviness’ – which means how much CPU time it used in the examined period – we can quickly identify the calculation as our culprit. Granted, this is an easy and obvious one and it’s not always a single function but some execution path that causes an application to slow down and sorting by heaviness will only show noise in such cases.
To analyze such problems, we need an eye friendly visualization that presents us the whole call stack.
A popular solution for that are flame charts and here’s a story by Netflix on how they used flame charts to trace down a bug in their express application.
Using sunburst charts
My personal preference are sunburst charts though and I’ve used them lately to show the huge effect of running express with NODE_ENV set to development versus production. Later I will show how to create them. Now let’s explore what they can tell us.
On the left we see the sunburst chart of a CPU sample taken from an express application running in development mode and on the right the same application in production mode.
It becomes instantly clear that there is much more going on in development mode.
The blue center is the main function. Every function called directly from main is placed in the adjacent row of the chart. If such a function calls another function it will be placed into an adjacent segment to the the calling function.
One circle segment spans all its children and the distance from the center is the depth of a function within the call tree. The angle measure represents the time a function and its children spent on the CPU with 365° being 100% of the time.
Looking at the chart we see that especially on the top left there is a largely fragmented section which uses about 25% of the CPU time. How do we know which functions those segments represent? Click here to open the chart in your browser and hover over the segments to display the function names – you can even click on a segment to zoom in.
I’m sure you will find out that the aforementioned 25% segment is obviously dedicated to compiling Jade templates. So for every request the templates are freshly compiled. This makes sense in development mode but in my blog post on this topic I also showed that this is the default when NODE_ENV isn’t explicitly set which can be a problem obviously.
Now that we have used CPU sampling and sunburst charts to trace down a complex CPU related problem the only remaining question is how to create such awesome sunburst charts.
Creating sunburst charts with D3.js
There are many code examples around and when searching for a starting point a snippet by my fellow teammate Martin Etmajer popped up. His way of showing labels would not work for my highly fragmented tree so I adapted the script to use show the function name on hover. Find my full script here.
Wrapping up on CPU analysis & node.js
Using these techniques, it is easy to enhance applications with tracing facilities that can even be switched on during runtime to introspect what an application is actually doing.
For small, single tier applications this can provide a minimal viable monitoring solution.
Node.js and Dynatrace
To see the big picture, including which transactions are passing through your node application, you may want to use Dynatrace. We provide a Node.js agent which will, in conjunction with the various other technologies supported, help you to understand your application as a whole.
This blog post is part of my ongoing research on instrumenting Node.js, and some of the tools introduced here are already in our product or will be incorporated into it soon.
Please feel free to contact me anytime if you have questions about my blog posts or how to instrument node. You can reach me on Twitter at @dkhan.