Mine insights from software development artifacts

Note: This pattern is part of a composite pattern. These are code patterns that can be stand-alone applications or might be a continuation of another code pattern. This composite pattern consists of:


There is a lot of unstructured text content that is generated in any domain – software development lifecycle, finance, healthcare, social media, etc. Valuable insights can be generated by analyzing unstructured text content and correlating the information across various document sources. This pattern uses Watson Natural Language Understanding, Python Natural Language Toolkit, OrientDB, Node-RED, and IBM Watson Studio to build a complete analytics solution that generates insights for informed decision-making.


This composite pattern uses a combination of other code patterns to derive insights from unstructured text content across various data sources. It is intended for developers who want a head start in building end-to-end solutions for such insights. This composite pattern demonstrates an insight methodology using IBM Cloud, Watson services, Python NLTK, OrientDB, and IBM Watson Studio.



  1. The unstructured text data that needs to be analyzed and correlated is extracted from the documents using custom Python code.
  2. Text is classified and tagged using the code pattern Extend Watson text classification.
  3. The text is correlated with other text using Correlate documents.
  4. The document data and correlations are stored in the OrientDB database using Store, graph, and derive insights from interconnected data.
  5. The analytics solution on IBM Watson Studio is invoked and visualized using Orchestrate data science workflows using Node-RED.