Win $20,000. Help build the future of education. Answer the Call for Code. Learn more

Audio Embedding Generator


This model recognizes a signed 16-bit PCM wav file as an input, generates embeddings, applies PCA transformation/quantization, and outputs the result as arrays of 1 second embeddings. The model was trained on AudioSet. As described in the code this model is intended to be used an example and perhaps as a stepping stone for more complex models. See the Usage heading in the tensorflow/models Github page for more ideas about potential usages.

Model Metadata

Domain Application Industry Framework Training Data Input Data Format
Audio Embeddings Multi TensorFlow Google AudioSet signed 16-bit PCM WAV audio file



Component License Link
Model GitHub Repository Apache 2.0 LICENSE
Model Files Apache 2.0 AudioSet
Model Code Apache 2.0 AudioSet
Test assets Various Samples README

Options available for deploying this model

  • Deploy from Dockerhub:

    docker run -it -p 5000:5000 codait/max-audio-embedding-generator
  • Deploy on Red Hat OpenShift:

    Follow the instructions for the OpenShift web console or the OpenShift Container Platform CLI in this tutorial and specify codait/max-audio-embedding-generator as the image name.

  • Deploy on Kuberneters:

    kubectl apply -f

    A more elaborate tutorial on how to deploy this MAX model to production on IBM Cloud can be found here.

  • Locally: follow the instructions in the model README on GitHub

Example Usage

Once deployed, you can test the model from the command line. For example:

curl -F "audio=@samples/car-horn.wav" -XPOST http://localhost:5000/model/predict
  "status": "ok",
  "result": [

Resources and Contributions

If you are interested in contributing to the Model Asset Exchange project or have any queries, please follow the instructions here.