Taxonomy Icon

Speech and Empathy

I have been building a Unity side project with a teammate across the pond using Watson Assistant, Speech to Text, and Text to Speech services to play some virtual reality chess. With most of the basic functions done, we noticed a little problem — Speech to Text just didn’t understand us when we referred to spaces on our grid, a fundamental part of our project. Instead of hearing “B4” it hears “be for.” In some cases, it hears things that aren’t even English like “de for” instead of “D4.”

In this tutorial, I explain how to build a custom language model for Watson Speech to Text for a specific domain. While this particular example demonstrates building the model for Unity in C#, the same logic can be applied to other languages and platforms. After reading this tutorial, you should be able to build your own custom language model.

Learning objectives

After completing this tutorial, you should be able to:

  • Identify when a custom language model might be useful
  • Build a custom language model for Unity

Prerequisites

To follow this tutorial, you will need:

Estimated time

It should take you approximately 30 minutes to complete the tutorial. Some of that time will be idle time while the custom model works toward a ready state in between steps.

Steps

Building a custom language model is a 4-step process.

  1. Create a custom language model and return a customization ID.
  2. Add custom words (from an object).
  3. Train the customization.
  4. Use the customization ID to hit a new model (not shown in the following gist/snippet).

After each step, you must wait until the model is back into a ready state by checking its status. The complete code snippet is available at the end of this tutorial.

I recommend training your model separately and just referencing the customization ID in your production code unless you need to do dynamic training.

Create a custom language model

To start, you’ll need to create the custom language model and obtain a customization ID.

    _speechToText.CreateCustomization(HandleCreateCustomization, OnFail, "unity-game-board", "en-US_BroadbandModel", "Adding game board domain items");
    while (!_createCustomizationsAdded)
        yield return null;

Add custom words (from an object)

After you have the customization ID and your model is in a ready state, you can add words to it using a corpus, a file path, or an object. For this example, I’m using an object. Additionally, you must use the verbose or specific name when using both Speech to Text and Text to Speech because words and wordlists are available in both.

        //  Add custom words from object
        IBM.Watson.DeveloperCloud.Services.SpeechToText.v1.Words words = new IBM.Watson.DeveloperCloud.Services.SpeechToText.v1.Words();
        IBM.Watson.DeveloperCloud.Services.SpeechToText.v1.Word w0 = new IBM.Watson.DeveloperCloud.Services.SpeechToText.v1.Word();
        List<IBM.Watson.DeveloperCloud.Services.SpeechToText.v1.Word> wordList = new List<IBM.Watson.DeveloperCloud.Services.SpeechToText.v1.Word>();
        w0.word = "B4";
        w0.sounds_like = new string[1];
        w0.sounds_like[0] = "be for";
        w0.display_as = "B4";
        wordList.Add(w0);
        IBM.Watson.DeveloperCloud.Services.SpeechToText.v1.Word w1 = new IBM.Watson.DeveloperCloud.Services.SpeechToText.v1.Word();
        w1.word = "C4";
        w1.sounds_like = new string[1];
        w1.sounds_like[0] = "see for";
        w1.display_as = "C4";
        wordList.Add(w1);
        words.words = wordList.ToArray();

        _speechToText.AddCustomWords(HandleAddCustomWordsFromObject, OnFail, _createdCustomizationID, words);
        while (!_addCustomWordsFromObject)
            yield return null;

Train the customization

After your words have been added and your model is in a ready state, you must train your model.

    _speechToText.TrainCustomization(HandleTrainCustomization, OnFail, _createdCustomizationID);
    while (!_trainCustomization)
        yield return null;

Use customization ID to hit new model

Based on the examples in the IBM Watson SDK for Unity, my Active method looks like this code, with the following parameters. You need to call your instances of Speech to Text with your customization ID to use your newly trained model.


public bool Active
    {
        get { return _speechToText.IsListening; }
        set
        {
            if (value && !_speechToText.IsListening)
            {
                _speechToText.CustomizationId = <YOUR_CUSTOMIZATION_ID_HERE>;
                _speechToText.DetectSilence = true;
                _speechToText.EnableWordConfidence = false;
                _speechToText.EnableTimestamps = false;
                _speechToText.SilenceThreshold = 0.03f;
                _speechToText.MaxAlternatives = 1;
                _speechToText.EnableInterimResults = true;
                _speechToText.OnError = OnError;
                _speechToText.StartListening(OnRecognize);
            }
            else if (!value && _speechToText.IsListening)
            {
                _speechToText.StopListening();
            }
        }
    }

Summary

Keep in mind, each call to modify the model puts the model in an unready state. Be sure to confirm your model is in a ready state before moving to the next step. Additionally, if you create a new model you will get a new customization ID, so be sure to reference the correct customization ID. View the gist that contains all of the custom language model pieces.

This tutorial was adapted from a Medium blog.