Audio Embedding Generator

Get this modelTry the API

Overview

This model recognizes a signed 16-bit PCM wav file as an input, generates embeddings, applies PCA transformation/quantization, and outputs the result as arrays of 1 second embeddings. The model was trained on AudioSet. As described in the code this model is intended to be used an example and perhaps as a stepping stone for more complex models. See the Usage heading in the tensorflow/models Github page for more ideas about potential usages.

Model Metadata

Domain Application Industry Framework Training Data Input Data Format
Audio Embeddings Multi TensorFlow Google AudioSet signed 16-bit PCM WAV audio file

References

Licenses

Component License Link
Model GitHub Repository Apache 2.0 LICENSE
Model Files Apache 2.0 AudioSet
Model Code Apache 2.0 AudioSet
Test assets Various Asset README

Options available for deploying this model

  • Deploy from Dockerhub:
docker run -it -p 5000:5000 codait/max-audio-embedding-generator
  • Deploy on Kuberneters:
kubectl apply -f https://raw.githubusercontent.com/IBM/MAX-Audio-Embedding-Generator/master/max-audio-embedding-generator.yaml

Example Usage

Once deployed, you can test the model from the command line. For example:

curl -F "audio=@assets/car-horn.wav" -XPOST http://localhost:5000/model/predict
{
  "status": "ok",
  "result": [
    [
      158,
      23,
      150,
      ...
    ],
    ...,
    ...,
    [
      163,
      29,
      178,
      ...
    ]
  ]
}