Speech to Text Converter

Get this modelTry the API

Overview

This model converts speech into text form. The model takes a short (~5 second), single channel WAV file containing English language speech as an input and returns a string containing the predicted speech.

The model expects 16kHz audio, but will resample the input if it is not already 16kHz. Note this will likely negatively impact the accuracy of the model.

The code for this model comes from Mozilla’s Project DeepSpeech and is based on Baidu’s Deep Speech research paper.

Model Metadata

Domain Application Industry Framework Training Data Input Data Format
Audio Speech Recognition General TensorFlow Mozilla Common Voice Audio (16 bit, 16 kHz, mono WAV file)

References

Licenses

Component License Link
Model Github Repository Apache 2.0 LICENSE
Model Weights Mozilla Public License 2.0 Mozilla DeepSpeech
Model Code (3rd party) Mozilla Public License 2.0 DeepSpeech LICENSE
Test assets Various Asset README

Options available for deploying this model

This model can be deployed using the following mechanisms:

  • Deploy from Dockerhub:

To run the docker image, which automatically starts the model serving API, run:

docker run -it -p 5000:5000 codait/max-speech-to-text-converter

This will pull a pre-built image from Docker Hub (or use an existing image if already cached locally) and run it. If you’d rather checkout and build the model locally you can follow the run locally steps below.

  • Deploy on Kubernetes:

You can also deploy the model on Kubernetes using the latest docker image on Docker Hub.

On your Kubernetes cluster, run the following commands:

kubectl apply -f https://raw.githubusercontent.com/IBM/max-speech-to-text-converter/master/max-speech-to-text-converter.yaml

The model will be available internally at port 5000, but can also be accessed externally through the NodePort.

Test the model using cURL

Once deployed, you can test the model from the command line. For example if running locally:

curl -F "audio=@assets/8455-210777-0068.wav" -X POST http://localhost:5000/model/predict
{"status": "ok", "prediction": "your power is sufficient i said"}