Overview

This model recognizes a signed 16-bit PCM wav file as an input, generates embeddings, applies PCA transformation/quantization, uses the embeddings as an input to a multi-attention classifier and outputs top 5 class predictions and probabilities as output. The model currently supports 527 classes which are part of the Audioset Ontology. The classes and the label_ids can be found in class_labels_indices.csv. The model was trained on AudioSet as described in the paper ‘Multi-level Attention Model for Weakly Supervised Audio Classification’ by Yu et al.

The model has been tested across multiple audio classes, however it tends to perform best for Music / Speech categories. This is largely due to the bias towards these classes in the training dataset (90% of audio belong to either of these categories). Though the model is trained on data from Audioset which was extracted from YouTube videos, the model can be applied to a wide range of audio files outside the domain of music/speech. The test assets provided along with this model provide a broad range.

Model Metadata

Domain Application Industry Framework Training Data Input Data Format
Audio Classification Multi Keras/TensorFlow Google AudioSet signed 16-bit PCM WAV or MP3 audio file

References

Licenses

Component License Link
Model Github Repository Apache 2.0 LICENSE
Model Files Apache 2.0 AudioSet
Model Code MIT AudioSet Classification
Test assets Various Samples README

Options available for deploying this model

  • Deploy from Dockerhub:

    docker run -it -p 5000:5000 codait/max-audio-classifier
    
  • Deploy on Red Hat OpenShift:

    Follow the instructions for the OpenShift web console or the OpenShift Container Platform CLI in this tutorial and specify codait/max-audio-classifier as the image name.

  • Deploy on Kuberneters:

    kubectl apply -f https://raw.githubusercontent.com/IBM/MAX-Audio-Classifier/master/max-audio-classifier.yaml
    

    A more elaborate tutorial on how to deploy this MAX model to production on IBM Cloud can be found here.

  • Locally: follow the instructions in the model README on GitHub

Example Usage

You can test or use this model

Test the model using cURL

Once deployed, you can test the model from the command line. For example if running locally:

curl -F "audio=@samples/thunder.wav" -XPOST http://localhost:5000/model/predict
{
    "status": "ok",
    "predictions": [
        {
            "label_id": "/m/06mb1",
            "label": "Rain",
            "probability": 0.7376469373703003
        },
        {
            "label_id": "/m/0ngt1",
            "label": "Thunder",
            "probability": 0.60517817735672
        },
        {
            "label_id": "/t/dd00038",
            "label": "Rain on surface",
            "probability": 0.5905200839042664
        }
    ]
}

Test the model in a Node-RED flow

Complete the node-red-contrib-model-asset-exchange module setup instructions and import the audio-classifier getting started flow.

Test the model in CodePen

Learn how to send an audio clip to the model in CodePen.

Test the model in a serverless app

You can utilize this model in a serverless application by following the instructions in the Leverage deep learning in IBM Cloud Functions tutorial.