by Gireesh Punathil | Published March 26, 2019
The Diagnostic Report utility was recently brought into the Node.js core to help developers identify almost all scenarios of Node.js application anomalies in production. The scenarios include abnormal termination such as a crash, slow performance, memory leak, high CPU, unexpected errors, incorrect output, and more.
While the report does not pinpoint the exact problem or specific fixes, its content-rich diagnostic data offers vital hints about the issue and accelerates the diagnostic process.
The utility was originally available as a npm module, and was brought into the Node.js core because it significantly helps identify the root cause of numerous types of problems, including support issues sent to the different repositories in the Node.js organization. Before it was part of the core, you had to explicitly add the dependency to the npm module in a users’ application, which was a blocker for adoption of the diagnostic tool.
In this blog post, I describe why this tool is important, and then go into some detail on how to interpret the report data, and towards the end of the post, walk you through some example use cases.
Typically, the starting point to diagnose a problem in an application is to:
Problem determination of Node.js deployments involves a number of different tools and methodologies. The problem itself determines the action you take to resolve the issue.
For example, if your application crashes, you would:
For an issue related to a memory leak, these steps might be different.
For production-grade deployments, the approach for diagnosing the problem that I outlined above poses a number of challenges, specifically:
The solution is useful documentation that explains the most common diagnostic data that is pertinent to your specific execution environment. Diagnostic Report does this using first failure data capture (FFDC). This document is in semi man-machine readable format, so you can read it in its original state if you’re moderately skilled at diagnostics reporting or it can be loaded into a JS program or passed to a monitoring agent.
This document can improve the overall troubleshooting experience because it:
Ideally, the FFDC enables someone to resolve the issue without any additional information!
Diagnostic Report is an experimental tool that is built into the Node.js core. Its function is to produce a JSON document about points of application misbehavior, or at a point where the user is interested in getting more information. The document produced contains information about the state of the application and the hosting platform, covering all the vital data elements.
The following command line argument runs Diagnostic Report (there are many other ones but this is one).
$ node--experimental-report --diagnostic-report-uncaught-exception w.js
Writing Node.js report to file: report.20190309.102401.47640.001.json
Node.js report completed
A few command line arguments are available to control the report generation triggers and the report generation behaviors.
You can also generate the report explicitly via an API which is exposed through the Node.js process object. When using the API, the report is available both as a disk file or a JSON string. Another API controls the report generation triggers and reports generation behaviors.
In this section, we illustrate some of the benefits of Diagnostic Report through a few different use cases. Keep in mind that this list isn’t exhaustive. Diagnostic Report is a general-purpose tool that can be used in any problem scenarios.
Identify which SSL library the current Node installation is linked against (roughly identify the distribution).
In this case, you can produce a report through the process.report.writeReport() API. As the following image shows, the component versions section contains the SSL linkage information. In this case, it is linked against version 1.1.1b of openssl (line 12).
Reviewing the shared libraries that Node is linked against, you see no external SSL libraries in the list. From there, you can conclude that this is a standard community distribution.
A Node application hangs when you expect it to complete some tasks and then terminate. You have no idea what is causing the event loop to engage.
In this case, produce the report by sending a SIGUSR2 signal to the running process.
The generated report shows an active timer handle lying in the loop that has an expiry time of around 10 hours from the current time. (You can see that on line 4; “firesinMSfromNow” shows how many milliseconds it takes to fire).
Because the application should not be scheduling an event for 10 hours in the future, you now understand the reason for hang. To fix, search in the application that installs a setTimeout handler with the said duration.
You to make sure a web application that you host in a cloud environment is idle outside of business hours.
In this case, you again produce a report by logging into the cloud instance through SSH and sending a signal to the running process. The report generated in the persistent volume showed the resource usage section:
Lines two and three show the time spent by the node process in the user space and kernel space. Because they were only a fraction of a second spent, you can be confident that the application is relatively idle, with no file system activities in the recent past.
Diagnostic Report is available as an experimental feature from Node.js v11.8.0 and subsequent releases. The tool could exit the experimental status and become a stable and supported feature, based on:
Again, this is based on user feedback.
In software development, ‘feature freeze’ is the inability to refine interfaces because they are already massively used in the field and have many software abstractions built on top of them; any changes to the interfaces can break all these.
To avoid feature freeze with our Diagnostic Report tool, we ask that you evaluate this feature as soon as you get an opportunity and provide your valuable feedback directly in the Node.js Diagnostic User Feedback repo.
This tutorial reflects the Node.js API as of v11.12.0.
Can't figure out why your Node.js applications are failing? This article covers common Node failure points and the open source…
Get the Code »
Back to top