Overview

This model converts speech into text form. The model takes a short (~5 second), single channel WAV file containing English language speech as an input and returns a string containing the predicted speech.

The model expects 16kHz audio, but will resample the input if it is not already 16kHz. Note this will likely negatively impact the accuracy of the model.

The code for this model comes from Mozilla’s Project DeepSpeech and is based on Baidu’s Deep Speech research paper.

Model Metadata

Domain Application Industry Framework Training Data Input Data Format
Audio Speech Recognition General TensorFlow Mozilla Common Voice Audio (16 bit, 16 kHz, mono WAV file)

References

Licenses

Component License Link
Model Github Repository Apache 2.0 LICENSE
Model Weights Mozilla Public License 2.0 Mozilla DeepSpeech
Model Code (3rd party) Mozilla Public License 2.0 DeepSpeech LICENSE
Test assets Various Samples README

Options available for deploying this model

This model can be deployed using the following mechanisms:

  • Deploy from Dockerhub:

    To run the docker image, which automatically starts the model serving API, run:

    docker run -it -p 5000:5000 codait/max-speech-to-text-converter
    

    This will pull a pre-built image from Docker Hub (or use an existing image if already cached locally) and run it. If you’d rather checkout and build the model locally you can follow the run locally steps below.

  • Deploy on Red Hat OpenShift:

    Follow the instructions for the OpenShift web console or the OpenShift Container Platform CLI in this tutorial and specify codait/max-speech-to-text-converter as the image name.

  • Deploy on Kubernetes:

    You can also deploy the model on Kubernetes using the latest docker image on Docker Hub.

    On your Kubernetes cluster, run the following commands:

    kubectl apply -f https://raw.githubusercontent.com/IBM/max-speech-to-text-converter/master/max-speech-to-text-converter.yaml
    

    A more elaborate tutorial on how to deploy this MAX model to production on IBM Cloud can be found here.

    The model will be available internally at port 5000, but can also be accessed externally through the NodePort.

  • Locally: follow the instructions in the model README on GitHub

Example Usage

You can test or use this model

Test the model using cURL

Once deployed, you can test the model from the command line. For example if running locally:

curl -F "audio=@samples/8455-210777-0068.wav" -X POST http://localhost:5000/model/predict
{"status": "ok", "prediction": "your power is sufficient i said"}

Test the model in a serverless app

You can utilize this model in a serverless application by following the instructions in the Leverage deep learning in IBM Cloud Functions tutorial.

Resources and Contributions

If you are interested in contributing to the Model Asset Exchange project or have any queries, please follow the instructions here.