Node.js and the V8 Javascript engine have comprehensive facilities for development-time tracing, profiling and debugging, and the NPM subsystem allows additional diagnostic modules to be written. However, there are known and well-documented issues with the lack of available of post-mortem diagnostic facilities for Node.js in production environments, for example see Yunong Xiao’s presentation Debugging Node.js in Production at Node.js Interactive 2015. The Node.js community is responding with a series of proposals in the Post Mortem Diagnostics Working Group.

Problem diagnosis in production environments is heavily constrained by the need to maintain the performance and availability of the application. It is generally unacceptable to use development tools such as live-attached debuggers or patched builds to diagnose problems in production. Existing post-mortem facilities in the Node.js kernel consist of stdout/stderr messages and optional triggering of core dumps. The Node.js command-line option ‘node –v8-options’ lists a number of diagnostic options provided by the V8 Javascript engine. For example, the –trace_gc option provides tracing of the V8 GC component. Vital for production post-mortem is the –abort_on_uncaught_exception option which triggers a core dump when an uncaught exception is thrown. The Linux gcore command, or Windows Task Manager can also be used to trigger core dumps of Node.js instances.

The areas that the IBM Runtimes development team are focusing on in 2016 are a summary diagnostic report for Node.js triggered on a variety of failure events, and support for off-line viewing and analysis of core dumps from Node.js applications using the LLDB debugger library

1. NodeReport – a human-readable summary report from Node.js

In the event of a failure, the Node.js runtime will typically write some diagnostic messages to stdout and/or stderr streams, but it is very useful to capture additional information in a summary report written to file. As a minimum this allows some initial analysis of problem and correlation with other occurrences of the failure. It some cases it may be sufficient to fully diagnose the problem. Examples of a prototype report are shown below, triggered on out-of-memory, unhandled exception and looping application failures. Content of the NodeReport in these examples consists of a header section containing the event type, date, time, PID and Node version, a section containing the Javascript stack trace and a section containing V8 heap information. Existing V8 APIs are used to obtain the stack traces and V8 heap information.

Example 1 – Unhandled exception

In this example the Node.js instance has terminated because of an uncaught exception. The example NodeReport shows that the exception message was “Cannot find module ‘/home/rchamberlain/test/unknown.js’, with a Javascript stack trace that shows the code in module.js that was attempting to load the module. In this example there is quite likely sufficient information to fully diagnose the problem.


nodereport-exception

Example 2 – Out of Memory Error

In this example the Node.js instance has terminated because of a memory allocation failure. The report has identified the problem as a failure to allocate space in the Javascript heap, and has identified the code that was running when the failure occur. It also shows that approximately 1Gb of memory has been allocated to the V8 heap old space region, with less than 1Mb of available space. In this example there may be sufficient information to fully diagnose the problem, if the Javascript code that was running at the time of the failure was responsible for the excessive memory use, or it may be necessary to capture a core dump to allow the contents of the V8 heap to be analyzed.


nodereport-oom

Example 3 – Looping Javascript application

In this example the Node.js instance has not terminated, but has become unresponsive or is showing poor performance. The report has been triggered by sending a SIGURS2 signal to the Node.js process. The Javascript stack trace in the report shows that the code running was a function ‘my_listener’ in a source code file ‘/home/rchamberlain/test/loop.js’. It may be useful to trigger a few reports at intervals from the Node.js instance in this case, to confirm the suspected code that is causing the poor response or performance.


nodereport-signal

2. Analysis of core dumps from Node.js using the LLDB debugger library

Core dumps have been a vital for diagnostics in production environments for many years. They provide rapid and reliable capture by the operating system of the entire memory image of a failing application. The failing application is typically re-started immediately and problem analysis using the core dump proceeds off-line. Tools are readily available for examining C/C++ or assembler level information in core dumps. However, an issue with core dumps from runtimes such as Node.js and Java is that extra support is needed in the tools to provide a useful view of thread stacks, heap objects and code to the Javascript or Java programmer.

There are some existing tools that address this issue for Node.js, and provide the additional functionality needed to obtain Javascript language level information from core dumps:

Development work in the IBM Runtimes team is focused on improving the support for analysis of core dumps on Linux, Mac and Windows platforms using the llnode and lldb projects. Enhancements to the LLDB C++ SB API will allow all areas of memory in a Node.js process to be accessed from the core dump, which will in turn allow the V8 heap to be examined using the lldb debugger with the llnode plugin.

In addition, a prototype Javascript API, built as a native NPM module exploiting the llnode and lldb C++ libraries, shows how the content of a core dump could be accessed, viewed and analyzed from a Javascript application:


tooling_arch

The following example demonstrates a simple Node.js application using the prototype API to display the thread stacks from a Node.js core dump. The main thread stack shows that the dump was triggered by a V8 heap out of memory failure:


coredump-view

3 comments on"Future directions for diagnostics in Node.js production environments"

  1. […] Future Directions for Node Diagnostics in Production […]

  2. […] summary diagnostic report for Node.js was introduced in Future directions for diagnostics in Node.js production environments as one of the contributions that the IBM Runtimes development team are focusing on in 2016. This […]

  3. […] immediately and problem analysis using the core dump can proceed off-line. As introduced in Future directions for diagnostics in Node.js production environments, development work in the IBM Runtimes team is focused on improving the support for analysis of core […]

Join The Discussion

Your email address will not be published. Required fields are marked *