In this video:
- Bhavik Shah, Senior Offering Manager, IBM Watson
IBM Watson Senior Offering Manager Bhavik Shah discusses the Speech to Text service and the host of recent improvements and new features designed to make it more powerful than ever. He covers the latest enhancements, including language model customization and diarization.
Watson Speech to Text converts audio voice into written text, so apps that use it can transcribe calls in a contact center to identify what is being discussed, when to escalate calls, and to understand content from multiple speakers. You can create voice-controlled applications and customize the model to improve accuracy for the language and content you care about most, such as product names, sensitive subjects, or names of individuals.
The service offers three programming interfaces for transcribing speech to text:
- The WebSocket interface provides a single version of the
recognizemethod for transcribing audio
- The HTTP REST interface provides HTTP
POSTversions of the
recognizemethod that transcribe audio with or without establishing a session with the service
- The asynchronous HTTP interface provides a non-blocking
POST recognitionsmethod for transcribing audio
The language model customization interface lets you improve the accuracy of speech recognition for domains with industry-specific jargon such as medicine or information technology. Once you’ve customized the model, you can use it with your applications to provide customized speech recognition.
Diarization (also known as speaker diarization) is the process of partitioning an input audio stream into separate segments according to the speaker’s identity. The best part of this function is that with Watson, it can occur in real time, meaning your app can use it on live conversations.
Resources for you