Word Embedding Generator


Machine learning algorithms usually expect numeric inputs. When a data scientist wants to use text to create a machine learning model, they must first find a way to represent their text as a vector of numbers. These vectors are called word embeddings. The Swivel algorithm is a frequency-based word embedding that uses a co-occurence matrix. The idea here is that words that have similar meanings tend to occur together in a text corpus. As a result, words that have similar meanings will have vector representations that are closer than those of unrelated words.

This model enables you to train the Swivel algorithm on a preprocessed Wikipedia text corpus. For instructions on generating word embeddings on your own text corpus see the instructions in the TensorFlow model repository.

Model Metadata

Domain Application Industry Framework Training Data Input Data Format
Natural Language Word Embeddings General TensorFlow Any Text Corpus Words



Component License Link
Model GitHub Repository Apache 2.0 LICENSE
Model Code (3rd party) Apache 2.0 TensorFlow Models
Data CC BY-SA 3.0 Wikipedia Text Dump

Options available for training this model

  • Train on IBM Cloud – Watson Machine Learning: follow the instructions in the GitHub README

Resources and Contributions

If you are interested in contributing to the Model Asset Exchange project or have any queries, please follow the instructions here.