Overview

Skill Level: Beginner

In this example we show how to use IBM Watson Speech to Text to recognize speech from audio files stored on Box and enrich their metadata with the extracted text.

Ingredients

When dealing with audio files a lot of work is required to index them properly so that is easy to lookup for them when needed.

In this example we’ll show how to use IBM Watson Speech to Text to recognize speech from audio files stored on Box and enrich their metadata with the extracted text.

Here is what's needed:

Step-by-step

  1. Get IBM Watson credentials

    Login on https://console.bluemix.net/catalog/services/speech-to-text tand generate credentials for the Speech to Text service.

    Go to Service Credentials and copy the username & password values, we’re going to use them soon.

  2. Create Box Metadata template

    To be able to attach the recognized speech to audio files we need to create a Metadata template on Box:

    1. Access the Metadata section on your Box admin panel
    2. Create a new template called “Audio data”
    3. Create a metadata field named “transcript” of type “text”

     

    Creating the metadata template on Box

  3. Step 1: On New File uploaded

    Now let’s open Stamplay’s visual flow builder to put together this automation. The workflow will be started every time a new file will be uploaded in a given folder.

    • Initialize the project and create a flow
    • Select the Box connector and then Trigger “New File uploaded”
    • Connect your Box account by granting access to Stamplay
    • You can either search for or copy the Id of the folder where audio files will be uploaded (you can find the folder Id from the URL)
    • You’ll be asked to test the trigger, so upload a file in the target folder and wait until Stamplay confirms that data has been received successfully. Then click Save.

  4. Step 2: Get the uploaded file

    The next step is to grab the file that has been uploaded on Box so that we can pass it to IBM Watson.

    • Hover the + icon on the first step of your flow and click on Action
    • Select the Box component and then pick the Download File action
    • This action requires a valid Box file Id that you can easily grab from the previous step by using the data mapper. Click on the button next to the input field to open it.
    • Select the data pill id (body.source.id), it sits right after the data pill type

  5. Step 3: Recognize the Audio

    Now we’re ready to pass this file to IBM Watson.

    • Add one more action to your workflow
    • Select IBM Watson Speech to Text and then pick Recognize Speech From Audio
    • Paste the credentials that you previously copied from Bluemix, click Connect and then Continue.
    • Select the audio file format that you expect to be uploaded in the Content Type field and pass the result
    • In the second field named File pass the data pill URL that you can grab from the result of the Download File step.

  6. Step 4: Retrieve the result of the Speech Recognition

    The speech recognition is a complex process so is not a service that returns a result right after it is called. For this reason we need to retrieve the result of that with a separate action step.

    • Add one more action and select the IBM Watson Speech to Text connector
    • To fill Job Id field pass the Id of the result of the previous Recognize Speech From Audio action
    • After that, let’s turn on the flow so we can see if the recognition works fine

  7. Checkpoint

    Upload an audio file in the target Box folder. If everything has been configured correctly the flow will be started and after a while we’ll get the result of the transcript.

    To see if the Flow ran successfully, enter the History section (if may result pending for minutes, depending on the size of the audio file).

     

  8. Step 5: Put the transcript together

    IBM Watson Speech Recognition returns the transcript under the form of a list of sentences. So we need to append them to each other in order to have a single piece of text that will be applied as metadata.

    For this we’re going to use variables. Access the settings of the flow and create a variable named fulltranscript.

     

  9. Step 6: Iterate over the list of Watson’s results

    We need to append the single sentences extracted by IBM Watson to each other. To do this:

    • Add a Loop step to the flow
    • In the List filed pass the results data pill that you can find inside the Retrieve Text From Speech Recognition action

     

  10. Step 7: Iterate over the list of IBM Watson’s results

    Now we need to concatenate every single sentence so we’ll use the variable previously created, fulltranscript, to store and incrementally update it so that it will eventually contain the full text.

    The logic behind this is the following: for each result fulltranscript will be updated by appending the new piece of text to the variable current value.

    Consider a list of sentences “Hi”, “my name”, “is Giuliano”, before the loop starts fulltranscript is empty “”. The execution will go like this:

    1. “” + “Hi”
    2. “Hi” + “my name”
    3. “Hi my name” + “is Giuliano”

    Final result “Hi my name is Giuliano”.

    Let’s do this in our flow:

    • Over with your cursor the + icon of the Loop step and select In, then Action (actions inside a Loop are be executed as many time as the number of the items in the list)
    • Select the Variable component and then pick the Set/Update flow variable
    • Select the fulltranscript variable from the dropdown
    • The new variable value will be set to the current value + the item processed by the Loop.

  11. Step 8: Apply the Metadata

    Finally we can grab the full text stored in the variable and apply it to the Metadata of the file on Box.

    • Add one last action outside of the Loop
    • Select the Box component and then the Create Metadata on File action
    • The File Id is the same we used for the Download File action, and we can grab it again from the results of the very first step of our workflow
    • Pick the Metadata template previously created (Audio data)
    • Stamplay will load the fields of the Metadata template and we simply need to pass there the content of the variable fulltranscript.

     

    Upload a new file and after few minutes you’ll be able to see it’s metadata enriched by this powerful combo!

    Cool right?

     

  12. Conclusion

    At Stamplay we make it easy for people to automate processes and create high value integrations by tying together different apps.

    Signup for a free trial and start automating your processes on Stamplay with Jira, Intercom and hundreds of other apps now.

    If you need help to connect your apps or have an API that you want to make easy to connect with tweet us at @stamplay and/or drop us a mail at support@stamplay.com

Join The Discussion