Tutorial
Convert speech to text, and extract meaningful insights from data
Combine Watson Speech to Text with the watson_nlp library to transcribe speech data and get insights from that data.
On this page
The IBM Watson Speech to Text Service is a speech recognition service that offers many functions such as text recognition, audio preprocessing, noise removal, background noise separation, and semantic sentence conversation. It lets you convert speech into text by using AI-powered speech recognition and transcription.
In this tutorial, walk through the steps of starting a Watson Speech to Text Service, connecting to it through Docker in your local system, preprocessing a speech data set, and using the Watson Speech to Text Service to transcribe speech data. The tutorial also shows you how to extract meaningful insights from data by combining the functions of the Watson Speech to Text Service with the watson_nlp
library, a common library for natural language processing, document understanding, translation, and trust.
Prerequisites
To follow this tutorial, you must have:
- Your entitlement key to access the IBM Entitled Registry
- Docker installed
Note: Podman provides a Docker-compatible command-line front end. Unless otherwise noted, all of the Docker commands in this tutorial should work for Podman if you simply alias the Docker CLI with the alias docker=podman
shell command.
Steps
Step 1. Setup the environment
Step 1.1. Log in to the IBM Entitled Registry
The IBM Entitled Registry contains various container images for the Watson Speech to Text Service. After you obtain the entitlement key from the container software library, you can log in to the registry with the key and pull the container images to your local machine. Use the following command to log in to the registry.
Step 1.2. Clone the sample code repository
Clone the sample code repository.
Go to the directory that contains the sample code for this tutorial.
Step 1.3. Build the container image
Build a container image with the provided Dockerfile
with two pretrained models (en-us-multimedia
and fr-fr-multimedia
), which include support for two different languages: English (en_US) and French (fr_FR). You can add more models to support other languages by updating the provided Dockerfile
as well as the env_config.json
and sessionPools.yaml
files in the chuck_var
directory.
Step 1.4. Run the container to start the service
Run the container on Docker using the container image that was created in the previous step to start the Watson Speech to Text Service.
The service runs in the foreground. Now, you can access this service in your notebook or local machine.
Step 2. Watson Speech To Text analysis
Step 2.1. Data loading and setting up the service
Import and initialize some helper libraries that are used throughout the tutorial.
Load the voice data.
Create a custom function to plot the amplitude frequency.
Set up the parameters for using the Watson Speech to Text Service.
Create a function to get the values from the Watson Speech to Text Service.
Step 2.2 Speech data processing
Step 2.2.1. Background audio suppression
Load the speech data, and print the amplitude frequency.
Create a custom function to get the transcribed result without processing.
Remove background noise from the data by using the
background_audio_suppression
parameter with the URL.You can see that after suppressing background audio, STT is returning a clean processed transcript.
Step 2.2.2. Speech audio parsing
Use the
end_of_phrase_silence_time
parameter for speech audio parsing.You can see that after speech audio parsing, STT is returning a clean processed transcript.
Step 2.2.3 Speaker labels
Set the
speaker_labels
parameter to find the number of speakers in the speech data.Create a custom function to find the number of speakers in the speech data.
You can see the speakers in the transcript after adding
Speaker
label arguments to the STT API call.
Step 2.2.4 Response formatting and filtering
The Watson Speech to Text Service provides features that you can use to parse transcription results. You can format a final transcript to include more conventional representations of certain strings and to include punctuation. You can redact sensitive numeric information from a final transcript.
Use the
smart_formatting
parameter to get conventional results.You can see that with
smart_formatting=True
, the date, punctuation, time, number, and email address have been formatted correctly. Therefore, response formatting and filtering helps in getting a cleaner and processed transcript.
Step 3. Microphone recognition
To record real-time voice, the SpeechRecognition
and PyAudio v0.2.12
open source Python libraries are used.
Install the open source libraries.
pip install SpeechRecognition
orpip3 install SpeechRecognition
from the terminal or!pip3 install SpeechRecognition
from the Jupyter Notebookbrew install portaudio
pip install pyaudio
orpip3 install pyaudio
from the terminal or!pip3 install pyaudio
from the Jupyter Notebook
Use a microphone to record the audio.
Use the Watson Speech to Text Service to transcribe the recorded audio.
Step 4. Transcribe customer call and extract meaningful insights using the watson_nlp
library
You can use the Watson Speech to Text Service to transcribe calls from the customer care centers. These transcripts can then be used to extract insights by using the watson_nlp
library.
Load the customer care call data. The data is available in the same Watson Speech GitHub repo.
Create a function to combine the transcripts into one document.
Process all call center voice data, and create a list of documents.
Load the relevant models from the
watson_nlp
library.Extend the stop words list to filter out the common stop words from analysis.
Remove the stop words, and lowercase the text in the transcripts.
Extract the keywords and phrases from the transcribed document.
Remove unigram and bigrams from the data set, and plot the most frequent phrases.
You can see that some of the most frequent phrases in the recorded calls were accurate information customers and concern file alert equifacts. You can use these types of insights to understand the pain points and major areas of improvement. For example, the customer service team can create self-service content or direct support to help customers rather than trying to determine who needs to speak to the customer. A customer call that includes the words ‘loan’, ‘mortgage’, ‘loan servicing’, or ‘loan payment issues’ could be sent to the loan department for resolution.
Conclusion
This tutorial walked you through the steps of starting a Watson Speech to Text Service, connecting to it through Docker in your local system, preprocessing the speech data set, and using the Watson Speech to Text Service to transcribe speech data. This tutorial also showed you how to extract meaningful insights from data by combining the functions of the Watson Speech to Text Service with the watson_nlp
library. To try out the service, work through the Watson Speech To Text Analysis notebook.
For more examples of using embeddable AI, see the IBM Developer Embeddable AI page.