Train and evaluate an audio classifier

Get the code View the demo

Summary

This developer code pattern will guide you through training a deep learning model to classify audio embeddings on IBM’s Deep Learning as a Service (DLaaS) platform – Watson™ Machine Learning – and performing inference/evaluation on IBM Watson Studio.

Description

Let’s say you have a huge collection of unlabeled or uncategorized music. A classifier trained on existing music genres can efficiently organize such files and incorporate them into recommender systems. Or perhaps you use digital assistants like Google Home or Amazon Alexa. They have great speech recognition abilities, but using a well-trained classifier, they can be made even smarter with enhanced capabilities transcending speech and conversation.

The model we will create will use audio embeddings as an input and generate output probabilities/scores for 527 classes. The classes cover a broad range of sounds like speech, music genres, natural sounds like rain/lightning, automobiles, etc. The full list of sound classes can be found at AudioSet Ontology. The model accepts embeddings of 10-second audio clips, as opposed to the raw audio itself. The embedding vectors for raw audio can be generated using the VGGish model, which converts each second of raw audio into an embedding (vector) of length 128, resulting in a tensor of shape 10×128 as the input for the classifier. To illustrate the concept and expose you to the features on IBM Cloud platforms, Google’s AudioSet data is used, where the embeddings have been preprocessed and are readily available. Though AudioSet data is used here, you can leverage this model to create your own custom audio classifier trained on your own audio data.

When you have completed this code pattern, you will understand how to:

  • Set up an IBM Cloud object storage bucket and upload the training data to the cloud.
  • Upload a deep learning model to Watson Machine Learning for training.
  • Integrate the object storage buckets into IBM Watson Studio.
  • Perform inference on an evaluation dataset using Jupyter Notebooks over IBM Watson Studio.

Flow

flow

  1. Upload training files to Object Storage.
  2. Train on Watson Machine Learning.
  3. Transfer trained model weights to new bucket on IBM Cloud and link it to IBM Watson Studio.
  4. Upload and run the attached Jupyter notebook on Watson Studio to perform inference.

Instructions

Ready to put this code pattern to use? Complete details on how to get started running and using this application are in the README.