Three types of customization
Language Translation now includes both glossary customization, which was announced in August, and two new types of corpus-based customization, with multiple types for each. Corpus based customization enables the training of translation models. This approach allows the service to learn statistical patterns from a large body of aligned, translated text — putting the power of cognitive computing in the hands of Watson Language Translation users.
The three types of Language Translation customization now supported are:
- Forced Glossary – This enables setting up custom term-translation pairs to always translate given terms to specified terms.
- Parallel Corpus – Entries are treated as a parallel corpus instead of a glossary. A parallel corpus adds examples and context that the translation model learns from to improve the translated responses.
- Monolingual Corpus – Adds terms and sentences to the target language model. This enables you to expand the dictionary of the target language.
Web-based customization tool
In addition to introducing the customization capability through our APIs, we have also built a web-based tool to enable non-developers to perform customization. With this tool, anyone with the training data required to perform customization (whether it be Forced Glossary, Parallel Corpus, or Monolingual Corpus) and basic computer literacy will be able to create language translation models for their specific domain or use case. Once a Language Translation instance has been properly set up in Bluemix, the tool can be accessed. See the documentation links below for using the tool.
Languages supported for customization
We now support customization for the following language pairs:
- English to / from Spanish
- English to / from French
- English to / from Brazilian Portuguese
- English to / from Arabic
For some of you, customization will mean a minor change to a model so that the service works better for you. For others, customization will mean building substantively new models, that take full advantage of our new machine learning capabilities, to use, or to license as a revenue source.
- Support for Mandarin and Brazilian Portuguese in Speech to Text, including wideband and narrowband models for each.
- Support for OPUS compressed audio in Speech to Text. (See the documentation for Audio formats.)
SDKs for both Speech services that enable easier integration for native Android and iOS developers. These SDKs are available from the speech-android-sdk repository and the speech-ios-sdk repository in the watson-developer-cloud namespace on GitHub.
IBM is placing the power of Watson in the hands of developers and an ecosystem of partners, entrepreneurs, tech enthusiasts and students with a growing platform of Watson services (APIs) to create an entirely new class of apps and businesses that make cognitive computing systems the new computing standard.