Audio Classifier

Get this modelTry the API Try in a Node-RED flow

Overview

This model recognizes a signed 16-bit PCM wav file as an input, generates embeddings, applies PCA transformation/quantization, uses the embeddings as an input to a multi-attention classifier and outputs top 5 class predictions and probabilities as output. The model currently supports 527 classes which are part of the Audioset Ontology. The classes and the label_ids can be found in class_labels_indices.csv. The model was trained on AudioSet as described in the paper ‘Multi-level Attention Model for Weakly Supervised Audio Classification’ by Yu et al.

The model has been tested across multiple audio classes, however it tends to perform best for Music / Speech categories. This is largely due to the bias towards these classes in the training dataset (90% of audio belong to either of these categories). Though the model is trained on data from Audioset which was extracted from YouTube videos, the model can be applied to a wide range of audio files outside the domain of music/speech. The test assets provided along with this model provide a broad range.

Model Metadata

Domain Application Industry Framework Training Data Input Data Format
Audio Classification Multi Keras/TensorFlow Google AudioSet signed 16-bit PCM WAV or MP3 audio file

References

Licenses

Component License Link
Model Github Repository Apache 2.0 LICENSE
Model Files Apache 2.0 AudioSet
Model Code MIT AudioSet Classification
Test assets Various Asset README

Options available for deploying this model

  • Deploy from Dockerhub:
    docker run -it -p 5000:5000 codait/max-audio-classifier
    
  • Deploy on Kuberneters:
    kubectl apply -f https://raw.githubusercontent.com/IBM/MAX-Audio-Classifier/master/max-audio-classifier.yaml
    
  • Locally: follow the instructions in the model README on GitHub

Example Usage

You can test or use this model

Test the model using cURL

Once deployed, you can test the model from the command line. For example if running locally:

curl -F "audio=@assets/thunder.wav" -XPOST http://localhost:5000/model/predict
{
    "status": "ok",
    "predictions": [
        {
            "label_id": "/m/06mb1",
            "label": "Rain",
            "probability": 0.7376469373703003
        },
        {
            "label_id": "/m/0ngt1",
            "label": "Thunder",
            "probability": 0.60517817735672
        },
        {
            "label_id": "/t/dd00038",
            "label": "Rain on surface",
            "probability": 0.5905200839042664
        }
    ]
}

Test the model in a Node-RED flow

Complete the node-red-contrib-model-asset-exchange module setup instructions and import the audio-classifier getting started flow.