Deploy a serverless multilingual conference room  

Create a communications channel to allow clients who speak different languages to seamlessly communicate with each other

Last updated | By Kalonji Bankole

Description

This developer pattern proposes a method to create a communications channel to allow clients who speak different languages to seamlessly communicate with each other. This can be particularly useful in meeting rooms and conference calls where the participants are located in different countries, as translated subtitles, or audio can be generated and delivered in real time.

Overview

Have you ever wished there was a way for online game teammates who speak different languages to communicate effectively? Or how about in chat rooms with many clients, such as a Slack/Sametime/Zoom chat group? Live broadcasts like on YouTube or Twitch? Or maybe online classes/webinars? This developer pattern proposes a way to do just that: create a communications channel to allow different-language clients to seamlessly communicate with each other.

This pattern leverages the MQTT messaging protocol, which allows for each client to publish and “subscribe” to one or more channels. The channel makeup determines each client’s requested language and payload type, (fromClient/french/audio, for example).

The channel that each submitted message is sent to determines the sequence of cloud functions that will be called – for example, submitting a message to fromClient/english/audio will run the audio payload through the Watson® Speech to Text service, forward that result to the translator service, and distribute the translated result to all listening clients.

When you have completed this pattern, you will better understand how to:

  • Deploy IBM Cloud Functions actions/triggers
  • Interact with the Watson IoT platform
  • Set up a Cloud Foundry application

Flow

  1. Message received from a client, which can be a web browser, CLI, OpenWhisk action, SMS text, etc.
  2. If message payload contains an audio file, it is transcribed to text.
  3. Transcribed text is translated to other supported languages.
  4. If message is sent via SMS, sender phone number is added to an etcd key-value store. etcd is used here to maintain a list of subscribers’ phone numbers, as well as their respective languages. An adjustable TTL value is used here to remove numbers from the store if the subscriber does not participate in the conversation for 300 seconds.
  5. Translated messages/audio streams are published to various channels on the MQTT broker, which then distributes the messages among subscribing clients.

Related Blogs

Two “edgy” AI TensorFlow models for you!

The global Call for Code is well underway, we want to share some visual recognition models which could help you. These AI models can operate on the edge, which could be particularly useful for this years’ theme: disaster preparedness. How could visual recognition help in relief work? From satellite and drone imagery analysis, to classifying...

Continue reading Two “edgy” AI TensorFlow models for you!

Leveraging the power of AI at Unite Berlin

Last week, from June 19 – 21, we were at Unity’s premiere in Berlin: Unite 2018. This conference brought together Unity’s video game and development community. Unity touches 770 million gamers all over the world and is the market leader for consumer AR and VR use cases and is also rapidly emerging as the market...

Continue reading Leveraging the power of AI at Unite Berlin

Related Links

Glitch

Check out a demo of Watson Text to Speech where you can also create your own app.

Watson Text to Speech Demo

The service understands text and natural language to generate synthesized audio output complete with appropriate cadence and intonation.

Medium

Blog post discusses the duality between serverless functions and APIs.

IBM Cloud Blog

Post introduces serverless composition for IBM Cloud Functions.

Codeship

Blog post covers how IBM Cloud Functions might be the best solution for your tech stack.