In this article we are going to simulate a performance problem using a very simple Node.js Express based application. This will demonstrate how you can use flamegraphs via the new Profiling view in appmetrics-dash to spot potential Node.js performance issues in development before they reach production.

Set up

Firstly we need a really simple application that we enable for monitoring via appmetrics-dash:

require('appmetrics-dash').monitor();
const express = require('express');
const app = express();

// Very simple handler.
app.get('/simple', function simpleHandler(req, res) {
  res.send(`Hello World!\n`);
});

app.listen(3000);

To enable the appmetrics-dash dashboard you just need to npm install appmetrics-dash and call monitor() at the start of your program.

If you run this locally the dashboard will be visible on http://localhost:3001/appmetrics-dash. You can drive a tiny bit of load through it from the command line like this:
$ true; while (($?==0)); do sleep 1; curl http://127.0.0.1:3000/simple; done;
and response time information will be visible on the “Average Response Times (Top 5)” table.

We could drive more load by removing the sleep 1 and even more by running the same command from multiple terminals. The point of this article is to show you how to spot a problem before your application is under stress so we will keep the sleep and use one terminal.

Average Response Time for /simple

This is a good point to introduce the Profiling tab in appmetrics-dash. If you click the Profiling tab you will see an empty graph. Profiling data is disabled by default as profiling your code does add an overhead to the Node.js process. If you click the menu icon in the top left you will see options to “Enable Profiling” and an option “Clear Profiling Data” to reset the graph.

If you enable profiling the appmetrics-dash will display a flamegraph showing the profiling data. If you click on a cell you will see more details about the function that cell represents – what proportion of time was spent in that function and how it was called. In this simple example there is so little work in the handler that we are just profiling Express.

Using the flamegraph

Navigating the flamegraph is fairly easy, each block represents a function and the width of each block represents the time spent in that function. The block below a function will be the calling function. The blocks above are functions that are called by it. Let’s find the simpleHandler() function from the sample above. Clicking on a block for a function at the highest part of the graph will display it’s call stack in the right hand panel. One of the ancestors of the selected function is the simpleHandler() function. Try selecting blocks down a few levels to find the simpleHandler() function. The “Function Details” box shows the percentage of accumulated time spent in that function and its children. The self percentage is low as not much time was spent inside the simpleHandler() function itself. On the flamegraph you can see that box is wide but mostly covered by the box representing the send() function in Express, which is the only function simpleHandler() calls.

Flamegraph profiling requests to /simple

There was no need to give our handler function a name but without it the stack trace would show it as <anonymous function> which makes it harder to locate. It’s usually a good idea to name functions to make stack traces from errors or profiling easier to read.

Finding a problem

We can now add another handler to the application to simulate a problem where one handler has a slower response time. Add the following code to the application:

// Handler that synchronously calls one other, slow, function.
app.get('/sync', function syncHandler(req, res) {
    slowFunction();
    res.send(`Hello World!\n`);
});

function slowFunction() {
  let stop = Date.now() + 5;
  // A busy wait - to burn CPU.
  while (Date.now() < stop);
}

This adds a similar endpoint but with a 5 millisecond delay before sending the response. Unless you are trying to demonstrate CPU profiling then waiting by using a busy loop is a terrible thing to do in Node.js - however in our case this gives a nice obvious problem to look at and keeps the demonstration code nice and simple. In this demonstration I've used a delay loop as it's easy to control for the example. Blocking synchronous calls, like writeFileSync(), would accumulate ticks while blocking. This time would show up on the profiling view as well but don't give as reliable a delay for this demonstration.

Open up a command line and run:
$ true; while (($?==0)); do sleep 1; curl http://127.0.0.1:3000/sync; done;
and go to http://localhost:3001/appmetrics-dash then you can see the response time for /sync is over 5 milliseconds:

Average Response Time for /sync

Enable profiling and look at the profiling tab to see where the time is being spent:

Flamegraph profiling requests to /sync

We can see that most of the time is spent in slowFunction() which shouldn't come as a surprise.

We can't really fix this problem as it's artificial (spoiler: deleting the busy loop is the answer) but we can demonstrate how you can use profiling to identify a problem even when it isn't impacting your response time under light load.

Add the code below to the test program to create two new handlers:

// Handler that synchronously calls one other, slow, function.
app.get('/sync2', function syncHandler2(req, res) {
    res.send(`Hello World!\n`);
    slowFunction();
});

// Handler that asynchronously calls one other, slow, function.
app.get('/async', function asyncHandler(req, res) {
    setImmediate(slowFunction);
    res.send(`Hello World!\n`);
});

The first of these moves the call to the slow function after the response has been sent and the second uses setImmediate to push it onto the next cycle round the event loop.

If we use the command line to make requests again:
$ true; while (($?==0)); do sleep 1; curl http://127.0.0.1:3000/sync2; curl http://127.0.0.1:3000/async; done;
and monitor the example above we can see that the response time table now looks like this:

Average Response Time for /sync2 and /async

Based on this we might assume everything is now fine. These both fix the externally visible symptom - a slow response time - but both brush the problem under the carpet. As JavaScript code in Node.js is single threaded we can predict when this is going to start hurting us - at around ~200 requests per second (1000ms / 5ms) - we aren't going to be able to put any more tasks on the event loop as the delays will add up to 1 second.

However, using the flamegraph to show the profiling information we can see the problem still exists and has just been moved. The program is still spending a large percentage of it's time in slowFunction() in comparison to the rest of the code.

Flamegraph profiling requests to /sync2 and /async

The highlighted cell shows where the slowFunction() ran asynchronously from the original request that triggered it. As it was a callback passed to setImmediate(), rather than as part of callback passed to Express, it ran directly on the event loop and has a much shorter call stack.

Although this example is artificial, it highlights an important point. Node.js code is asynchronous so it's easy to have tasks building up on the event loop. This won't start to cause a problem until you reach a certain level of load, for example in production on black Friday. This is where profiling in development can help you.

V8 samples the call stack at a set frequency and reports where your JavaScript code is spending its time. Since V8 is the engine running your JavaScript code, there's no way to hide anything from it. Profiling gives us x-ray vision showing issues that we can't detect directly from the outside. You can identify the problems above, even when load is low, with a request only coming in once every second. That makes it practical to run in development while you have a mental image of how you expect your program to behave fresh in your mind and it's easier to fix issues.

Using profiling, it is possible to see where your program is spending its time and identify optimisation targets before your users start to see a problem. Because profiling information comes directly from the V8 engine inside Node.js, you can spot issues that may not be visible from external metrics until they become production problems. Using the Profiling view in appmetrics-dash is very easy. If you've never done any profiling before, you may be surprised to find what your application is doing behind the scenes.

Join The Discussion

Your email address will not be published. Required fields are marked *