Last week I had the privilege of hosting the first Node.js community Diagnostics summit at the IBM offices in Ottawa. 24 dedicated Node.js contributors from across the world braved the cold weather and met to discuss diagnostics over a 2 day period.

These are a couple of pictures of the attendees, almost the same except that I’m in one of them, and taking the other:



I see the broad level of interest and commitment to being involved in this effort as a sign that the Node.js community will have a significant focus on diagnostic tooling this year. This is important as diagnostic tooling will only become more important as Node.js is increasingly used in large-scale production deployments.

In order to set the context for discussions at the summit, we started the day with Netflix, who runs one of the existing large-scale Node.js deployments, giving us their perspective on the importance of diagnostics tooling and some of the existing challenges.

The group then did a quick pass through each of the key areas for discussion so that we could decide what to discuss in each of the breakout sessions.  The key areas included:

  • Post mortem debugging
  • Tracing and monitoring
  • Debugging
  • Platform neutral APIs
  • Testing and Documentation

Our goal was to document the next actions and desired outcomes in each of these areas as a basis for discussion in the breakout sessions as well as future work.

Some of the desired outcomes included:

  • Defining the key diagnostic use cases, then building a matrix which shows the set of tools that supports each of the use cases.
  • Defining the level of integration that tools supporting the diagnostic use cases should have with core, and the current state of CI testing and documentation.
  • Identifying additional meta-data or support that can be integrated into node core in order to support the diagnostic tools.
  • Documenting the next steps for improving post mortem debugging with promises.
  • Defining the uses cases for trace events and the next steps in harmonizing existing probes, which are implemented using a number of different frameworks (Dtrace, lttng, etc.).
  • Defining the plan for making trace event generation from JavaScript fast.
  • Defining the influence that failing diagnostic tooling tests should have on the release process.
  • Defining the plan for fixing CPU profiling.
  • Defining the plan for achieving faster stack traces.
  • Defining the use cases and requirements for Loader Hooks.
  • Defining the list of metrics, and APIs that should be available to expose those metrics.
  • Defining the use cases for Async Hooks, the API changes that would allow JavaScript engines to make them operate faster, and next steps to investigate performance characteristics.
  • Defining a model for asynchronous execution and context.
  • Defining next steps in making support for diagnostic tooling platform-neutral.
  • Defining next steps in implementing platform-neutral time travel debugging.

From the list, you can see there is a lot of work to do.  As you can imagine, we only had time to make a small start during the breakout sessions. It is important is that we capitalize on the energy from the in-person discussions to progress this work now that we are all back to our regular day-to-day routines.

 

The first step is to capture the work, next steps, and action owners from the breakout sessions. This is being coordinated in this issue.  You can follow that issue as a good top level view into the issues/discussions you might want to follow or get involved in.

It was great to see the active participation by the JavaScript Engine developers (both Google and Microsoft), application performance monitoring vendors, Node.js core collaborators, and end users like Netflix.  The mix of attendees allowed us to discuss and brainstorm solutions that might require changes outside of Node.js.  This allowed us to come up with next steps for exploring improved post-mortem debugging with promises, faster stack traces, fast trace events from JavaScript, and more.

I’d also like to thank a few of the people who gave us presentations (impromptu and planned), these include:

  • Joyee – llnode update
  • Matheus – CPU profiling issues and approach
  • Mark – Time travel debugging
  • Andreas– demo of node-clinic tooling

These presentations added valuable context and information for the surrounding discussions. I’d also like to thank Mike Kaufman for helping to organize the summit and moderating a number of the sessions.

Overall, I believe the event was a success and look forward to the next time.  We might choose somewhere warmer next time (how about Ottawa in the summer? -> just kidding).

In closing, I also want to add that if you want to contribute to the Node.js community, now is a good time to get involved in Node.js diagnostics work.  You can follow the issues opened during the summit, those summarizing the next steps/actions from breakout sessions, or even make your own suggestions. If you need help jumping in, you can contact me, Mike Kauffman, or any of the other attendees for suggestions on how to get started. We are always looking to enable new contributors!

2 comments on"Node.js Community Diagnostics Summit Update"

  1. Hi All, I would like to get involved in Node.js diagnostics work.

    Thank you!!

  2. Yassire, I suggest you attend the next Diagnostics WG meeting to see how to get involved. It is scheduled for March 7 at 4 EST. Watch the repo here for an issue with the details of how to join: https://github.com/nodejs/diagnostics

Join The Discussion

Your email address will not be published. Required fields are marked *