Extract insights from videos using IBM Watson

Part of the World Health Organization’s guidance on limiting further spread of COVID-19 is to practice social distancing. As a result, companies in most affected areas are taking precautionary measures by encouraging work from home, and educational institutes are closing their facilities. Employees working from home must be aware of what’s happening in their company and must be able to collaborate with their teams. Additionally, students doing remote learning must be up to date with their education.

With the help of technology, employees can continue to collaborate and be involved with their work with virtual meetings. Schools and teachers can continue to engage with their students through virtual classrooms. These meetings can be recorded, and deriving insights from these recordings can be beneficial for the users. Towards this goal, this solution explains:

  • How to extract audio from video recordings
  • How to build a custom speech-to-text model that can produce diarized textual output of the audio
  • How to use advanced natural language processing along with IBM® Watson™ Tone Analyzer to extract insights from the textual file

This solution demonstrates how to extract insights from videos, specifically from meetings and classroom videos. It provides insights such as category, entity, concept, keywords, sentiments, emotions, top positive sentences, and word clouds. The following image shows an overview of the different parts of this solution.

sample-output

Architecture flow

architecture

  1. The user uploads a recorded video file of the virtual meeting or a virtual classroom in the application.
  2. The FFmpeg library extracts audio from the video file.
  3. The extracted audio is stored in IBM Cloud Object Storage.
  4. The Watson Speech to Text service transcribes the audio to give a diarized textual output.
  5. Tone Analyzer analyzes the transcript and picks up the top positive statements from the transcript.
  6. Watson Natural Language Understanding reads the transcript to identify key pointers from the transcript to get the sentiments and emotions.
  7. The key pointers and summary of the video is then presented to the user in the application as well as being stored in IBM Cloud Object Storage.
  8. The user can then download the textual insights.

The code patterns in this solution are built around data for the IBM Earnings Call Q1 2019 meeting recording. We analyze the earnings call and generate textual insights based on the video.

The code patterns explain how to combine Speech To Text services with Watson Natural Language Understanding and Tone Analyzer to generate textual insights from a video.

Extract audio from video

cp1

In the Extract audio from video code pattern, you learn the steps to:

  • Create an IBM Cloud Object Storage bucket
  • Upload the video files data to the bucket
  • Extract audio from the video files and store it in the bucket
  • Download the audio files

Build custom speech-to-text model with speaker diarization capabilities

cp2

In the Build a custom speech-to-text model with speaker diarization capabilities code pattern, you learn the steps to:

  • Train a custom language model with a corpus file
  • Train a custom acoustic model with audio files from the bucket
  • Transcribe the audio files from the bucket and get a diarized textual output
  • Store the transcript in the bucket

Use advanced natural language processing and tone analysis to extract meaningful insights

cp3

In the Use advanced natural language processing and tone analysis to extract meaningful insights code pattern, you learn the steps to:

  • Load the transcript file from the bucket
  • Select the entities to be extracted from the transcript
  • Get a natural language understanding report with entity, concept, category, keywords, sentiments, and emotions with advanced natural language processing
  • Get top five positive sentences with Tone Analyzer
  • Get a word cloud based on nouns, adjectives, and verbs
  • Print the natural language understanding report

Extract insights from videos

cp4

In the Extract insights from videos code pattern, you learn the steps to:

  • Upload any video file to the application
  • Get diarized textual output for the video file
  • Get a natural language understanding report with entity, concept, category, keywords, sentiments, and emotions with advanced natural language processing
  • Get top five positive sentences with Tone Analyzer
  • Get a word cloud based on nouns, adjectives, and verbs
  • Print the natural language understanding report

You can also see videos for all four code patterns in the Extract Insights from Videos with IBM Watson playlist on YouTube.