Taxonomy Icon

Artificial Intelligence

IBM Watson has multiple services including Assistant, Knowledge Studio, Discovery, Personality Insights, and many more.

The Watson team also provides multiple Software Development Kits (SDKs) to help you use Watson with various programming languages. The IBM Watson APIs GitHub repo has a full listing of available SDKs.

In this how-to, I introduce an open source Android Project that is built with IBM Watson Services. After reading this how-to, you should know how to integrate multiple Watson services into a single environment. You’ll learn how you can empower Watson Assistant with voice capabilities. You’ll be able to input speech and get the speech output from Watson Assistant on an Android device. You won’t need to press a button to start and stop the speech, but rather, the application will detect the start and end of the speech, convert the input speech into text, call the Watson Assistant API with the extracted text, and convert the returned Watson Assistant response into speech.

Learning objectives

After completing this how-to, you should know how to:

  • Download and run the open source Android Project on Android Studio
  • Set up the downloaded project with credentials for the services used
  • Run the application on the connected device
  • Understand the core methods of the application’s functionality

Prerequisites

To follow this how-to guide, you need to have:

Estimated time

It should take you approximately one hour to complete the how-to, thirty minutes for the prerequisites and another thirty minutes for the guide.

Steps

Note: This how-to was tested with Android Studio 3.0.1, Watson Android SDK v0.4.3, and macOS High Sierra.

Download the repository from GitHub

To download the repository, you have two options:

  1. Go to the project folder on the system where you would like to clone the repository and enter the following command from Git bash or a command-line interface.

    git clone https://github.com/fahadminhas/watson-assistant.git
    
  2. Download the repository directly from the Watson Assistant with voice capabilities repository.

Open the project on Android Studio

You’ll use the downloaded folder to proceed.

  1. Unzip the downloaded ZIP file into a new workspace folder.
  2. Launch Android Studio.
  3. Click File > Open in the menu bar and provide the root folder of the unzipped file.
  4. Android Studio builds the project using Gradle.

Note: If you see an Install missing platform(s) and sync project error or Install Build-Tools 26.0.1 and sync project error, then click the hyperlink provided to install the missing platform or tools. After they’re installed, the build should restart and complete successfully.

Setting up Watson service credentials

Now, you’ll set up the application with your own hosted services on IBM Cloud. Go to app > res > values > strings. You will see a view like the following.

<resources>
    <string name="app_name">WatsonAssistant</string>
    <!-- Watson Conversation Service Credentials -->
    <string name="workspace_id">Your Workspace ID of Watson Assistant</string>
    <string name="conversation_username">Your Watson Assistant Username</string>
    <string name="conversation_password">Your Watson Assistant Password</string>
    <string name="conversation_endpoint">https://gateway.watsonplatform.net/conversation/api</string>

    <!--Watson Speech-To-Text Service Credentials-->
    <string name="STT_username">Your Speech to Text Username</string>
    <string name="STT_password">Your Speech to Text Password</string>

    <!--Watson Text-To-Speech Service Credentials-->
    <string name="TTS_username">Your Text to Speech Username</string>
    <string name="TTS_password">Your Text to Speech Password</string>
</resources>

You can copy and paste your credentials for the Watson Assistant, Speech to Text, and Text to Speech services from IBM Cloud into this file in the prebuilt strings that are used in the code. You’ll see later how these strings are used in the API calls.

Build and run the application

Build the application

After providing the credentials for all three Watson services, you can build and run the project. You can run the application through a connected device or build the APK first and then transfer it to a mobile phone to be installed manually. Let’s look at both methods.

Transfer the application to a connected device

Running an Android project on a connected device is an efficient way of debugging projects. The relevant drivers must be installed on both the mobile device and Android Studio, and you must enable developer mode on the mobile device.

You must first download and install the Google USB Driver on Android Studio. Note that if you are developing on macOS or Linux, then you do not need to install the USB driver. Instead, refer to the Run Apps on a Hardware Device documentation.

In some cases, additional drivers are needed. When creating this guide, which was tested on Samsung Galaxy S8+ running Android version 8.0, I had to install USB drivers for Samsung on Windows.

The last step you need to complete to use your device as a connected debugging device is to enable USB debugging. To do so, you must enable Developer Mode. You do this by going into the device’s Settings, then selecting About Phone, Software Information, and tapping Build number seven (7) times. Now go to Settings and a new Developer Options menu appears. Enable USB Debugging to continue.

Now you’re all set for running and debugging on your connected device. Click the Play button on the menu bar of Android Studio, and you should see your connected device in the Select Deployment Target window as shown below.

Click OK, and the project starts building and running on the connected device.

Generate and install the APK

To install the application as an APK, you must first generate the APK. To do so, go to Build, then select Build APK. Android Studio starts building the project and generates an APK. After the process is complete, you can go to the APK location by clicking Show in Explorer in the Event Log.

Transfer this APK file on your mobile device by any means you like (email, sd card, Google Drive, and so on). Open a file explorer on your mobile device and navigate to the location of the transferred APK. Open the APK file. Android might prompt you to allow installation from an unknown source. If so, provide this permission, and the application starts installing on your phone.

Running the application

The following image is the view of the application. There is a Watson GIF that keeps running and portraying the concept of it continually working.

When the application starts, the previous image displays and a first blank input is automatically sent to Watson Assistant. This first input works as a Clear text function of the Try it panel on the Watson Assistant tool, which returns the first greeting message from Watson. After this message, you can begin speaking your input without pressing anything. The application detects the start of the input itself. It also detects the end of the speech when you are finished speaking. Note that there might be a second or two delay for the Watson response because it must ensure that the speech has ended. The rest of the process follows the same pattern, you give the input and wait for the Watson response, then give the next input.

Understanding the back end and API calls

In this section, you’ll learn how the input is being handled, that is, how the APIs are being called for converting the speech input into text and and converting the Watson Assistant response back into speech. In this regard, I’ll start explaining the small code snippets that are performing the main functions.

I’ll start with the first auto input I talked about earlier, which is given in the onCreate() function in the MainActivity.java file.

If you take a closer look at the onCreate() function, there are multiple variables being initialized, such as Watson Service variables with credentials. But there is also a sendMessage() method being called. This message initiates the entire interaction with Watson. The flow from here on keeps the process automated, that is, getting the speech input of the user, converting it into text, sending this text to Watson Assistant, getting the response of Assistant, and converting it back to speech. I’ll now try to explain this automation process loop. The following code is the sendMessage() method, and you’ll see what is being done here.

private void sendMessage() {

    Thread thread = new Thread(new Runnable(){
        public void run() {
            try {

                ConversationService service = new ConversationService(ConversationService.VERSION_DATE_2017_02_03);
                service.setUsernameAndPassword(conversation_username, conversation_password);

                //service.setEndPoint("https://gateway-fra.watsonplatform.net/conversation/api");
                MessageRequest newMessage = new MessageRequest.Builder().inputText(micText).context(context).build();
                MessageResponse response = service.message(workspace_id, newMessage).execute();

                //Passing Context of last conversation
                Message outMessage=new Message();
                if(response!=null)
                {
                    if(response.getOutput()!=null && response.getOutput().containsKey("text"))
                    {

                        ArrayList responseList = (ArrayList) response.getOutput().get("text");
                        if(null !=responseList && responseList.size()>0){
                            outMessage.setMessage((String)responseList.get(0));
                            outMessage.setId("2");
                        }

                        //Converting response to Speech
                        final String recMsg=outMessage.getMessage();
                        Thread thread = new Thread(new Runnable() {
                            public void run() {
                                Message audioMessage;
                                try {
                                    textToSpeech = new TextToSpeech();
                                    textToSpeech.setUsernameAndPassword(TTS_username, TTS_password);

                                    streamPlayer = new StreamPlayer();
                                    streamPlayer.playStream(textToSpeech.synthesize(recMsg, Voice.EN_LISA).execute());
                                    micText="";
                                    recordMessage();

                                } catch (Exception e) {
                                    e.printStackTrace();
                                }
                            }
                        });
                        thread.start();
                    }

                }
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    });
    thread.start();
}

In this method, you are at first initializing the Watson Assistant Object and giving that object the service’s IBM Cloud credentials. You are then creating a MessageRequest object that contains the user message in itself. Note that the object micText in this object contains the message. You’ll look into it next when it is initialized, but in the first call it is set to a blank string. You pass this object in an API call to Watson Assistant and provide the Watson Assistant workspace ID to it and receive the response in an object of MessageResponse.

Now, you apply certain checks to see if you have a valid response in your response object. If you have a valid response, you make a thread and create a Text to Speech object with its credentials to convert the response into speech. Then, you call the TTS API in a StreamPlayer player object, which plays the audio of the response. You do this all in a thread for the purpose of making these API calls and playing the audio response at the back end, which means it will not disturb or delay the flow of the application.

When the StreamPlayer is done playing the response, you clear the micText object and call the recordMessage() function, which keeps the application in a loop of giving a response and getting user input automatically.

//Record a message via Watson Speech to Text
private void recordMessage() {
    speechService = new SpeechToText();
    speechService.setUsernameAndPassword(STT_username, STT_password);

    if(listening != true) {
        capture = microphoneHelper.getInputStream(true);

        new Thread(new Runnable() {
            @Override public void run() {
                try {
                    speechService.recognizeUsingWebSocket(capture, getRecognizeOptions(), new MicrophoneRecognizeDelegate());
                } catch (Exception e) {
                    showError(e);
                }
            }
        }).start();
        listening = true;
    }
}

In the try block of the previous method, the STT API is being called. Its parameters are the capture object of MicrophoneInputStream, a function that returns RecognizeOptions, which defines the nature of recognizing the input such as Interim Results of Input. The Interim Results of Input return real-time results and InactiviyTimeout of input, which defines when to stop taking input after a specified time of inactive microphone input. The third parameter is the custom class MicrophoneRecognizeDelegate, a new object that implements RecognizeCallBack. RecognizeCallBack is available in the Watson SDK and lets you implement functions on the Speech Results. See the Interface RecognizeCallback documentation. This interface let you customize the functionality of the response with its onTranscribe(SpeechResults speechResults) function, which gets called when SpeechRecognitionResults are received. The following code is the snippet of onTranscription(SpeechResults speechResults).

public void onTranscription(SpeechResults speechResults) {
    if(speechResults.getResults() != null && !speechResults.getResults().isEmpty()) {
        String text = speechResults.getResults().get(0).getAlternatives().get(0).getTranscript();
        showMicText(text);
    }
}

With the onTranscription method, when you get the speech results you pass them to the showMicText(text) whose code snippet is below:

private void showMicText(final String text) {
    runOnUiThread(new Runnable() {
        @Override public void run() {
            micText=text;
            new CountDownTimer(2000, 1000) {

                public void onTick(long millisUntilFinished) {
                }

                public void onFinish() {
                    //mTextField.setText("done!");
                    Log.i("Sequence", micText);
                    if(micText.equals(text))
                    {
                    Log.i("Sequence 2", micText);
                        stopRec();
                    }
                }
            }.start();
        }
    });
}

In the previous function, you are setting the value of the global variable micText with the speech result passed to this function. To make sure the input has stopped, you are creating a CountDownTimer for 2 seconds and checking if the speech result is still the same (assuming that the user is not extending the speech). After the CountDownTimer completes and the result is constant, you call the stopRec() function, as shown below:

private void stopRec()
{
    try {
        if(listening) {
            microphoneHelper.closeInputStream();
                listening = false
                sendMessage();
        }
    } catch (Exception e) {
        e.printStackTrace();
    }

}

In the previous code, you are stopping the recording and calling the sendmessage() function explained prevously, which gets the Watson Assistant response. The flow then continues the same way.

Summary

In this how-to, you were introduced to the back end of the Android application built on Watson APIs and learned how multiple Watson APIs can be integrated in a single environment to create a custom solution. Thanks for reading!