Watson Speech to Text service on Node-RED

Before we created the media manipulation utilities, many people using the Watson Speech to Text service on Node-RED would say, “But I want to speak to it, not just send an audio file!” So as part of a concerted effort that also generated those media utils, we created a microphone node, a camera node, and a file inject node.

Roll on a few hackathons, and people using the Watson Speech to Text node, normally in conjunction with the Watson Conversation node, wondered if they could have the microphone on their web page. The answer has always been yes, but you need to add the appropriate HTML5, getUserMedia, and AudioContext JavaScript to your page. Let me tell you, sample code in this field is not easy code to decipher, and it’s difficult to get the bare essentials into a reusable format.

WebSocket input

Then last week, a colleague of mine stopped me in Hursley and asked if the Speech to Text node could handle WebSocket input. I said no, but the underlying Watson service can, and I had been toying with the idea of allowing Node-RED WebSocket input to feed into the Speech to Text node. He explained to me that some customers of his had created a web application that was using a WebSocket interface to the Watson Speech to Text service, but they were then sending the transcription back to the server to be processed. Wouldn’t it be better if they could use Node-RED to accept the WebSocket input, feed into Speech to Text, but then process the output before returning back to the app? The mobile application would only need to point at the new Node-RED WebSocket.

So, I added a new streaming configuration, making sure that the existing microphone route continued to work. With the new option, you can wire a WebSocket input node to a Speech to Text node and the output from the Speech to Text node to a WebSocket output node. The node creates a WebSocket interface to the Watson Speech to Text service, creates and manages an authentication token, and renews it when it expires.

To test the new configuration, I needed a web page that created a WebSocket connection to Node-RED and sent audio. So I delved into the AudioContext samples and created a bare essentials AudioContext and WebSocket web page to test the Speech to Text node. Testing the node took longer than writing the code. Watson tokens expire after one hour and I had to ensure that the Speech to Text node recovered and renewed the token when needed. Each fix would take an hour to test because I was relying on the Watson Speech to Text service to tell me when the token had expired. Once I had it working, I ran a test and then let my instance of Node-RED run for 24 hours before retesting. It was still working, so the feature was released in version 0.6.3 of the Watson Node-RED nodes.

I also published my test web page, along with a sample Node-RED flow as a bare minimum starter.

What’s next?

What is going to be the next ask at hackathons? By the way, people have also been asking for a camera on their web page, and this we have also done (a sample flow will be published soon). It, along with the Speech to Text Web Socket feature, was part of my session at Index 2018.

Join The Discussion

Your email address will not be published. Required fields are marked *