Build a recommender with Apache Spark and Elasticsearch
Walk through a Jupyter Notebook that demonstrates how to use Apache Spark and Elasticsearch to train and use a recommendation model
Recommendation engines are among the most well known, widely used and highest-value use cases for applying machine learning. Despite this, while there are many resources available for the basics of training a recommendation model, there are relatively few that explain how to actually deploy these models to create a large-scale recommender system.
This developer pattern demonstrates the key elements of creating a recommender system by using Apache Spark and Elasticsearch. A Jupyter Notebook shows you how to use Spark for training a collaborative filtering recommendation model from ratings data stored in Elasticsearch, saving the model factors to Elasticsearch, then using Elasticsearch to serve real-time recommendations by using the model.
Upon completion, you’ll know how to:
- Ingest and index user event data into Elasticsearch by using the Elasticsearch Spark connector.
- Load event data into Spark DataFrames and use Spark’s machine learning library (MLlib) to train a collaborative filtering recommender model.
- Export the trained model into Elasticsearch.
- Using a script score query in Elasticsearch, compute similar item and personalized user recommendations and combine recommendations with search and content filtering.
- Load the movie dataset into Spark.
- Use Spark DataFrame operations to clean up the dataset and load it into Elasticsearch.
- Using Spark MLlib, train a collaborative filtering recommendation model.
- Save the resulting model into Elasticsearch.
- Using Elasticsearch script score queries and vector scoring functions, generate some example recommendations. The Movie Database API is used to display movie poster images for the recommended movie.