Generative models using deep learning are a hot topic of late. Whether it’s solving issues involving the countering of fraud, spell check, language translation, simply making jobs easier, or even being the focus of a predicted information apocalypse by technologist Aviv Ovadya, automated generation of content is the future. I know that is a packed statement, so let’s dive in a little, shall we?
Generative models literally mean generating new material. This could be the generation of video, audio, music, or even text, which is what this code pattern focuses on. As compelling as that is, as with anything, there are positive and negative consequences of these models. One negative implication of these deep learning models is that you can seemingly generate any material and make it look real, even to the detriment of someone or something. This means that someone can make it look like something happened that did not in fact happen. It also means that you can generate articles or text that were not authentically created, at least within our current frame of thought. As you can imagine, these generated creations can then be applied to harmful activities, such as fraudulent reviews, which companies such as Yelp and Amazon must deal with constantly. But how do we counter these activities? With the exact same approach. I know it seems crazy, but by building the same models that were used for harmful activities, a new model can be built in extension of the original, to counter the activity that is being generated. Let’s break this down. Using the example of fake reviews, a second model learns what the fake reviews look like through the original model, and then predicts the likelihood of each new review being a fake one with the model built on top of it. That means that the original fraudulent model creates fake reviews and then a second model considers that output, learning what generated fake reviews look like. Cool, right? And now you can learn too. This code pattern provides an introduction to a generative language model that uses long short-term memory (LSTM) layers and a recursive neural network (RNN).
For the unfamiliar, RNNs use networks with hidden layers of memory to predict the next step by using the highest probability. Unlike convolutional neural networks (CNNs), which use forward propagation, or rather, move forward through its pipeline, RNNs utilize backpropagation or circling back through the pipeline to use the “memory” mentioned above. For example, if we were considering a sentence, as the model builds the sentence it uses memory to consider what has already been written as it generates what comes next. For our model, the RNN uses the characters already imputed to learn how to generate the next letters or characters as its output and cycles back through to remember what it is learning. The output then goes on to form a word, which eventually ends up as a collection of words, and then sentences and paragraphs, which you will see generated as you run the model and work through the code pattern. The LSTM provides improved performance for those words and sentences we’re generating.
Using the code pattern, you’ll learn how to use TensorFlow and Keras to generate a restaurant review. While the scope of this code pattern is limited to an introduction on text generation, it provides a strong foundation to learning how to build a language model. I hope it will motivate you to try it out and make it even better!