Most developers learn best by examples, which naturally tend to simplify matters and omit things that aren’t essential for understanding. This means that the “Hello World” example, when used as starting point for an application, may be not suitable for production scenarios at all.
I started using Node.js like that and I have to confess that it took me almost two years to quantify the huge performance impact of omitting a single environment variable. In fact it was just a coincidence that I even did it right in my previous projects.
Mind your environment
Environment variables are often used for distinguishing between environments like “production” or “development”. Depending on those variables an application may turn debugging on or off, connect to a specific database, or listen on a specific port.
In Node.js there is a convention to use a variable called NODE_ENV to set the current mode.
For my Node.js projects I mostly use Express.js, a lightweight web application framework. If you look at its Hello World example you might think Express doesn’t care about the mentioned NODE_ENV variable because there is no reference to it. However, a look at the source code tells a slightly different story. We see that it in fact reads NODE_ENV and defaults to ‘development’ if it isn’t set. This variable is exposed to applications via ‘app.get(“env”)’ and can be used to apply environment specific configurations as explained above, but it’s up to you to use this or not.
One strength of Node.js is its module ecosystem, and a typical application uses plenty of them. For Express I always at least install a template engine which can be easily added to a project. You see that there is no further configuration needed. It just works out-of-the-box. And basically this is where the trouble starts.
The hidden powers of NODE_ENV
While you might think that you are environment-agnostic by not using NODE_ENV or app.get(‘env’), Express takes care of that for you and will for instance switch the view cache on or off depending on if NODE_ENV ist set to ‘development’ or ‘production’. This makes sense because you don’t want to restart your app to clear the cache every time you change some markup. Consequently, this also means that your views are parsed and rendered for every request in development mode. Do you remember that this is the default if you omit NODE_ENV?
In my applications I always used some conditions that relied on the “env” variable, so I launched them with something like
NODE_ENV=<environment> node server.js. But what would happen if I didn’t? Would there be a notable performance hit?
I’ve created a little sample app that connects to a database, calls some external services and renders a page using Jade.
After instrumenting it using Dynatrace I ran several tests using the apache benchmarking tool (ab) like that:
ab -c 100 -n 10000 http://<myhostname>:3000/
This means: open this webpage 10.000 times with a concurrency of 100.
Using Dynatrace Web I charted the number of requests versus CPU usage and the results were really stunning:
The green bars show the number of requests and the blue line represents the CPU usage. We clearly see that by setting NODE_ENV to production the number of requests Node.js can handle jumps by around two-thirds while the CPU usage even drops slightly.
Let me emphasize this: Setting NODE_ENV to production makes your application 3 times faster!
Let’s peek into the CPU
At this point I was just curious what was really going on in the CPU, so I used v8-profiler to profile it. My sample app already contains the code for triggering CPU snapshots and I’ll do an in-depth blog post about that soon.
My favorite way of displaying trees like call stacks is the sunburst chart which works like a pie-chart with different levels that represent a tree structure.
This means in this case that the whole circle is the time slice measured and the arc length of the different slices represents the time that a given function occupied the CPU. As call stacks are a tree structure the center represents the root node and the steps away from the center of a given slide equals its depth in the tree.
There is a lot going on and if we hover over the individual slices we see that the CPU is busy with rendering the Jade templates most of the time.
After switching NODE_ENV to production the result looks completely different:
It immediately becomes clear that there is much less going on on the CPU than previously. Now only a fraction of time is spent with template rendering and – as we already found out – more time can be spent serving requests.
I learned much during these experiments. Most importantly to never ever run Express in production without NODE_ENV set accordingly. I will also push for displaying the current mode in Dynatrace to save our clients from bad surprises in the future.
Edit: Alright – but how do I set NODE_ENV properly?
I was asked that question on facebook after releasing this post and I agree that it really makes sense to cover that.
NODE_ENV works like any other environment variable (e.g. PATH) and it depends on your platform how to set it:
- Linux and OSX:
On all platforms you can explicitly set it when starting your application like that:
NODE_ENV=production node myapp.js. You may also use a module like dotenv to set it from an .env file in your application directory. That said, you should avoid setting the environment directly in your code because this contradicts the purpose of environment variables.
Creating the chart that shows the number of requests Node can handle took me around five minutes. Try it out yourself. Dynatrace is available for free if you use it on your local machine and can be tested with full functionality for 30 days if you deploy it on your webserver.