Speech to Text Converter

Get this modelTry the API


This model converts speech into text form. The model takes a short (~5 second), single channel WAV file containing English language speech as an input and returns a string containing the predicted speech.

The model expects 16kHz audio, but will resample the input if it is not already 16kHz. Note this will likely negatively impact the accuracy of the model.

The code for this model comes from Mozilla’s Project DeepSpeech and is based on Baidu’s Deep Speech research paper.

Model Metadata

Domain Application Industry Framework Training Data Input Data Format
Audio Speech Recognition General TensorFlow Mozilla Common Voice Audio (16 bit, 16 kHz, mono WAV file)



Component License Link
Model Github Repository Apache 2.0 LICENSE
Model Weights Mozilla Public License 2.0 Mozilla DeepSpeech
Model Code (3rd party) Mozilla Public License 2.0 DeepSpeech LICENSE
Test assets Various Samples README

Options available for deploying this model

This model can be deployed using the following mechanisms:

  • Deploy from Dockerhub:

    To run the docker image, which automatically starts the model serving API, run:

    docker run -it -p 5000:5000 codait/max-speech-to-text-converter

    This will pull a pre-built image from Docker Hub (or use an existing image if already cached locally) and run it. If you’d rather checkout and build the model locally you can follow the run locally steps below.

  • Deploy on Red Hat OpenShift:

    Follow the instructions for the OpenShift web console or the OpenShift Container Platform CLI in this tutorial and specify codait/max-speech-to-text-converter as the image name.

  • Deploy on Kubernetes:

    You can also deploy the model on Kubernetes using the latest docker image on Docker Hub.

    On your Kubernetes cluster, run the following commands:

    kubectl apply -f https://raw.githubusercontent.com/IBM/max-speech-to-text-converter/master/max-speech-to-text-converter.yaml

    The model will be available internally at port 5000, but can also be accessed externally through the NodePort.

  • Locally: follow the instructions in the model README on GitHub

Test the model using cURL

Once deployed, you can test the model from the command line. For example if running locally:

curl -F "audio=@samples/8455-210777-0068.wav" -X POST http://localhost:5000/model/predict
{"status": "ok", "prediction": "your power is sufficient i said"}