Accelerate the value of multicloud with collaborative DevSecOps Learn more

Extract audio from video

Summary

In this code pattern, learn how to extract audio from video and store it in IBM® Cloud Object Storage when given a video recording of a virtual meeting or a virtual classroom.

Description

The first step in getting insights from video is extracting audio from the video and storing it in a common accessible storage space. This code pattern shows how to look at a video recording of a meeting and extract audio from that video file using the open source FFmpeg library in a Python Flask runtime. FFmpeg is a complete, cross-platform solution to record, convert, and stream audio and video. You then store the extracted audio in IBM Cloud Object Storage, a highly scalable cloud storage service that is designed for high durability, resiliency, and security. The stored audio files are used for further processing to provide speaker diarization in the next code pattern of the Extracting insights from videos with IBM Watson solution.

After you’ve completed this code pattern, you understand how to:

  • Create an IBM Cloud Object Storage bucket
  • Upload the video files data to the bucket
  • Extract audio from the video files and store it in the bucket
  • Connect Flask applications directly to IBM Cloud Object Storage

Flow

Extract audio from video and store it in Cloud Object Storage

  1. The user uploads the video file to the application.
  2. The FFmpeg library extracts the audio from the video file.
  3. The extracted audio file is stored in IBM Cloud Object Storage.

Instructions

Find the detailed steps for this pattern in the README file. Those steps show you how to:

  1. Clone the the GitHub repository.
  2. Create the IBM Cloud Object Storage service.
  3. Add the credentials to the application.
  4. Deploy the application.
  5. Run the application.

This code pattern is part of the Extracting insights from videos with IBM Watson use case series, which showcases the solution on extracting meaningful insights from videos using Watson Speech to Text, Watson Natural Language Processing, and Watson Tone Analyzer services.