Personal assistant devices are one of the main use cases for speech-to-text technology. “Wake words” engage devices to process what they hear, like “Hey Google” or “Alexa,” often sending it to the cloud if a connection has been established. You can use the Watson Speech to Text service somewhat similarly, depending on how you write your client application.

While the libraries and methods might differ for your target platform or programming language, the steps in this tutorial help you understand how to create an application that’s always listening, but runs only after a “wake word” has been triggered.

Learning objectives

In this tutorial, you learn how to create a virtual assistant in Node.js that engages with the user only after a “wake word” is heard. It uses several Watson services to handle the verbal dialog between the user and the virtual assistant.

Prerequisites

Estimated time

Completing this tutorial should take approximately 30 minutes. This time assumes that you have an IBM Cloud account and an NPM environment set up on your local computer (Mac OS X or Linux).

You’ll need to create several assets to complete this tutorial, so it’s a good idea to start by creating a new local subdirectory to place them all.

Step 1: Create Watson services

From the IBM Cloud dashboard, click Create Resource to create the “Lite” versions of each service.

Configure credentials

  1. Copy and paste the following code into a local file and name it .env.

     # Watson Assistant
     ASSISTANT_WORKSPACE_ID=<add_assistant_workspace_id>
     ASSISTANT_IAM_APIKEY=<add_assistant_iam_apikey>
    
     # Watson Speech to Text
     SPEECH_TO_TEXT_URL=<url>
     SPEECH_TO_TEXT_IAM_APIKEY=<add_speech_to_text_iam_apikey>
    
     # Watson Text to Speech
     TEXT_TO_SPEECH_URL=<url>
     TEXT_TO_SPEECH_IAM_APIKEY=<add_text_to_speech_iam_apikey>
    
  2. Replace the <***> tags with the actual values created for your services.

    You can find the credentials by clicking Service credentials, then View credentials from the window of your created Watson service, as shown in the following image.

    Service credential window

    An additional WORKSPACE_ID value is required to access the Watson Assistant service. To get this value, select Manage, then Launch tool from the window of your Watson Assistant service. From the service instance window, select Skills to display the skills that exist for your service. For this tutorial, we use the Customer Care Sample Skill that comes with the service.

    Sample skill window

  3. Click the option button (highlighted in the previous image) to view all of your skill details and service credentials.

    Skill details

Step 2: Create your run-time environment

Now, you need to download the NPM packages to run your application. The following code shows a sample package.json file that you can place in your local directory.

{
  "name": "chatbot-with-voice-activation-wake-word",
  "version": "1.0.0",
  "description": "Converse with a virtual assistant.",
  "main": "run.js",
  "scripts": {
    "start": "node run.js"
  },
  "dependencies": {
    "dotenv": "^6.0.0",
    "ibm-watson": "^4.0.1",
    "jsonfile": "^4.0.0",
    "mic": "^2.1.1",
    "node-ffprobe": "^1.2.2",
    "play-sound": "^1.1.1",
    "prompt": "^1.0.0"
  }
}

After you’ve created the file, you can simply run the install command to download the required packages.

npm install

You might need to install a few audio-related dependencies if they don’t exist on your system.

On OS X

Use the brew command to install:

  • mplayer
  • sox
  • ffmpeg
brew install sox mplayer ffmpeg

On Ubuntu

Use the apt-get command to install:

  • ffmpeg
sudo apt-get install ffmpeg

Step 3: Load the virtual assistant code

The following code snippet is a simple Node.js app that uses the Watson services that you just created.

The code performs the following primary functions:

  • Create and initialize instances of the Watson services
  • Create and set up the microphone object
  • Convert audio from the microphone into text
  • Convert text into audio that is then played back through the speaker
  • Conduct a dialog by responding to questions from the user
  • Keep a timer to determine whether the virtual assistant is awake or asleep

Copy and paste the following code into a local file and name it run.js.

require('dotenv').config({ silent: true });

const AssistantV1 = require('ibm-watson/assistant/v1');
const TextToSpeechV1 = require('ibm-watson/text-to-speech/v1');
const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const fs = require('fs');
const mic = require('mic');
const speaker = require('play-sound')(opts = {});
const ffprobe = require('node-ffprobe');
var context = {};
var debug = false;
var botIsActive = false;
var startTime = new Date();

const wakeWord = "hey watson";      // if asleep, phrases that will wake us up

const SLEEP_TIME = 10 * 1000;       // number of secs to wait before falling asleep

/**
 * Configuration and setup
 */

/* Create Watson Services. */
const conversation = new AssistantV1({
  version: '2019-02-28'
});

const speechToText = new SpeechToTextV1({
});

const textToSpeech = new TextToSpeechV1({
});

/* Create and configure the microphone */
const micParams = {
  rate: 44100,
  channels: 2,
  debug: false,
  exitOnSilence: 6
};
const microphone = mic(micParams);
const micInputStream = microphone.getAudioStream();

let pauseDuration = 0;
micInputStream.on('pauseComplete', ()=> {
  console.log('Microphone paused for', pauseDuration, 'seconds.');
  // Stop listening when Watson is talking.
  setTimeout(function() {
    microphone.resume();
      console.log('Microphone resumed.');
  }, Math.round(pauseDuration * 1000));
});

/**
 * Functions and main app
 */

/* Convert speech to text. */
const textStream = micInputStream.pipe(
  speechToText.recognizeUsingWebSocket({
    content_type: 'audio/l16; rate=44100; channels=2',
    interim_results: true,
    inactivity_timeout: -1
  })).setEncoding('utf8');

/* Convert text to speech. */
const speakResponse = (text) => {
  var params = {
    text: text,
    accept: 'audio/wav',
    voice: 'en-US_AllisonVoice'
    // voices to choose from:
    // en-US_AllisonVoice
    // en-US_LisaVoice
    // en-US_MichaelVoice
  };

  var writeStream = fs.createWriteStream('output.wav');
  textToSpeech.synthesize(params)
  .then(audio => {
    // write the audio version of the text to the wav file
    audio.pipe(writeStream);
  })
  .catch(err => {
    console.log('error:', err);
  });

  writeStream.on('finish', function() {
    // determine length of response to user
    ffprobe('output.wav', function(err, probeData) {
      if (probeData) {
        pauseDuration = probeData.format.duration;
        // pause microphone until response is delivered to user
        microphone.pause();
        // play message to user
        speaker.play('output.wav');
        // restart timer
        startTime = new Date();
      }
    });
  });
  writeStream.on('error', function(err) {
    console.log('Text-to-speech streaming error: ' + err);
  });
};

/* Log Watson Assistant context values, so we can follow along with its logic. */
function printContext(header) {
  if (debug) {
    console.log(header);

    if (context.system) {
      if (context.system.dialog_stack) {
        const util = require('util');
        console.log("     dialog_stack: ['" +
                    util.inspect(context.system.dialog_stack, false, null) + "']");
      }
    }
  }
}

/* Log significant responses from Watson to the console. */
function watsonSays(response) {
  if (typeof(response) !== 'undefined') {
    console.log('Watson says:', response);
  }
}

/* Determine if we are ready to talk, or need a wake up command */
function isActive(text) {
  var elapsedTime = new Date() - startTime;

  if (elapsedTime > SLEEP_TIME) {
    // go to sleep
    startTime = new Date();
    botIsActive = false;
  }

  if (botIsActive) {
    // in active conversation, so stay awake
    startTime = new Date();
    return true;
  } else {
    // we are asleep - did we get a wake up call?
    if (text.toLowerCase().indexOf(wakeWord) > -1) {
      // time to wake up
      console.log("App just woke up");
      botIsActive = true;
    } else {
      // false alarm, go back to sleep
      console.log("App needs the wake up command");
    }
    return botIsActive;
  }
}

/* Keep conversation with user alive until it breaks */
function performConversation() {
  console.log('App is listening, you may speak now.');

  textStream.on('data', (user_speech_text) => {
    userSpeechText = user_speech_text.toLowerCase();
    console.log('\n\nApp hears: ', user_speech_text);
    if (isActive(user_speech_text)) {
      conversation.message({
        workspace_id: process.env.CONVERSATION_WORKSPACE_ID,
        input: {'text': user_speech_text},
        context: context
      }, (err, response) => {
        context = response.context;

        watson_response =  response.output.text[0];
        if (watson_response) {
          speakResponse(watson_response);
        }
        watsonSays(watson_response);
      });
    }
  });
}

/* Start the app */
microphone.start();
performConversation();

Step 4: Run the application

Use the following command to run the application.

npm start

Some important notes regarding the execution of the virtual assistant:

  • The app starts in sleep mode and only replies when it hears “Hey Watson.”
  • The app returns to sleep mode if it doesn’t hear anything for 10 seconds, although you can change this value in the app.
  • After being engaged, the app responds in accordance with the Customer Care Sample Skill dialog defined in the Watson Assistant service. You can explore the dialog from within the service tooling for help with determining phrases that it will respond to.
  • Whenever the virtual assistant speaks, it pauses the microphone during the phrase so that it doesn’t inadvertently hear itself and get confused.
  • Audio from the app is streamed to a local file named output.wav, which is then played through the speaker.
  • The app generates console output that should help you follow along and debug any issues you might run into.

The following image shows some example output from the virtual assistant.

Virtual assistant sample output

Note: As you can see, the dialog is less than optimal. When the app goes back to sleep, it should clear the dialog and start fresh. This can be fixed by modifying the sample dialog provided in Watson Assistant, but dialog customization is beyond the scope of this tutorial.

Troubleshooting

If you’re running the app on Mac OS X and the microphone doesn’t appear to be picking up any sound, ensure that the basic microphone function is working.

sox -d test.wav        // speak into mic, then ctrl-c to exit
sox test.wav -d        // playback

Summary

This tutorial showed how you can use a wake word to initiate dialog with a virtual assistant that is built using IBM Watson services.

If you want a similar solution that works in a browser, look at “Capturing Audio in the Browser for “Wake Words”. Looking for other solutions for streaming? Check out this Node.js SDK example.