Get this modelTry the API Try in a Node-RED flow
By IBM Developer Staff | Updated September 21, 2018 - Published July 12, 2018
Artificial intelligenceDeep LearningAudio Classification
This model recognizes a signed 16-bit PCM wav file as an input, generates embeddings, applies
uses the embeddings as an input to a multi-attention classifier and outputs top 5 class predictions and probabilities as
output. The model currently supports 527 classes which are part of the
Audioset Ontology. The classes and the label_ids can be
found in class_labels_indices.csv. The model was trained on
AudioSet as described in the paper
‘Multi-level Attention Model for Weakly Supervised Audio Classification’ by Yu et
The model has been tested across multiple audio classes, however it tends to perform best for Music / Speech categories.
This is largely due to the bias towards these classes in the training dataset (90% of audio belong to either of these
categories). Though the model is trained on data from Audioset which was extracted from YouTube videos, the model can be
applied to a wide range of audio files outside the domain of music/speech. The test assets provided along with this
model provide a broad range.
Jort F. Gemmeke, Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, Marvin Ritter,“Audio set: An ontology and human-labeled dataset for audio events”, IEEE ICASSP, 2017.
Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley, “Audio Set classification with attention model: A probabilistic perspective”, arXiv preprint arXiv:1711.00927 (2017).
Changsong Yu, Karim Said Barsim, Qiuqiang Kong, Bin Yang , “Multi-level Attention Model for Weakly Supervised Audio Classification”, arXiv preprint arXiv:1803.02353 (2018).
S. Hershey, S. Chaudhuri, D. P. W. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold et al., “CNN architectures for large-scale audio classification”, arXiv preprint arXiv:1609.09430, 2016.
docker run -it -p 5000:5000 codait/max-audio-classifier
kubectl apply -f https://raw.githubusercontent.com/IBM/MAX-Audio-Classifier/master/max-audio-classifier.yaml
You can test or use this model
Once deployed, you can test the model from the command line. For example if running locally:
curl -F "audio=@assets/thunder.wav" -XPOST http://localhost:5000/model/predict
"label": "Rain on surface",
Complete the node-red-contrib-model-asset-exchange module setup instructions and import the audio-classifier getting started flow.
March 14, 2019
March 29, 2019
Artificial intelligenceData Science+
June 11, 2019
Back to top