Get the code
View the demo
by Karthik Muthuraman | Published July 13, 2018
Artificial intelligenceData ScienceDeep LearningMachine LearningObject Storage
This developer code pattern will guide you through training a deep learning model to classify audio embeddings on IBM’s Deep Learning as a Service (DLaaS) platform – Watson™ Machine Learning – and performing inference/evaluation on IBM Watson Studio.
Let’s say you have a huge collection of unlabeled or uncategorized music. A classifier trained on existing music genres can efficiently organize such files and incorporate them into recommender systems. Or perhaps you use digital assistants like Google Home or Amazon Alexa. They have great speech recognition abilities, but using a well-trained classifier, they can be made even smarter with enhanced capabilities transcending speech and conversation.
The model we will create will use audio embeddings as an input and generate output probabilities/scores for 527 classes. The classes cover a broad range of sounds like speech, music genres, natural sounds like rain/lightning, automobiles, etc. The full list of sound classes can be found at AudioSet Ontology. The model accepts embeddings of 10-second audio clips, as opposed to the raw audio itself. The embedding vectors for raw audio can be generated using the VGGish model, which converts each second of raw audio into an embedding (vector) of length 128, resulting in a tensor of shape 10×128 as the input for the classifier. To illustrate the concept and expose you to the features on IBM Cloud platforms, Google’s AudioSet data is used, where the embeddings have been preprocessed and are readily available. Though AudioSet data is used here, you can leverage this model to create your own custom audio classifier trained on your own audio data.
When you have completed this code pattern, you will understand how to:
Ready to put this code pattern to use? Complete details on how to get started running and using this application are in the README.
March 14, 2019
March 21, 2019
Artificial intelligenceData Science+
March 12, 2019
Back to top