Digital Developer Conference: Hybrid Cloud 2021. On Sep 21, gain free hybrid cloud skills from experts and partners. Register now

Archived | Create a virtual reality speech sandbox

Archived content

Archive date: 2019-08-08

This content is no longer being updated or maintained. The content is provided “as is.” Given the rapid evolution of technology, some content, steps, or illustrations may have changed.


This developer pattern will show you how to build advanced interactive speech systems for virtual reality with just two Watson™ services: Watson Speech-to-Text for transcription and Watson Assistant for parsing the meaning of the words. Discover how to leverage the Watson Unity SDK to implement the services right from the Unity development environment.


Virtual reality (VR) enables users to feel like they truly inhabit a different space. In a VR environment, speech is a more natural interface than other methods for certain interactions. You don’t want to pause an experience to stare at a control or, heaven forbid, type a command; you want to be in the moment. The ability to simply speak instructions keeps you in that moment and helps provide an entirely new dimension of immersion for users.

By learning how to add speech controls to VR environments, you can build more richly interactive, immersive experiences – and position your own skills for the next big technology revolution. When you complete this developer pattern, you will understand how to add IBM Watson Speech-to-Text and Assistant services to a virtual reality environment built in Unity, the popular 3D development platform.

There are several popular VR head-mounted devices that offer users powerful immersive experiences. Their popularity and versatility make them ideal candidates for speech interaction. This developer pattern shows you how to implement speech controls for Google Cardboard, HTC Vive, and Oculus Rift, three of the most popular head-mounted VR devices.



  1. User interacts in virtual reality and gives voice commands such as “Create a large black box.”
  2. The Virtual Reality Hardware microphone picks up the voice command and the running application sends it to Watson Speech-to-Text.
  3. Watson Speech-to-Text converts the audio to text and returns it to the running Application that powers the VR Hardware.
  4. The application sends the text to Watson Assistant. Watson Assistant returns the recognized intent “Create” and the entities “large,” “black,” and “box.” The virtual reality application then displays the large black box (which falls from the sky).