IBM Z Day on Nov. 21: Discover the ideal environment for modern, mission-critical workloads. Learn more

Build a recommender with Apache Spark and Elasticsearch

Summary

Recommendation engines are among the most well known, widely used and highest-value use cases for applying machine learning. Despite this, while there are many resources available for the basics of training a recommendation model, there are relatively few that explain how to actually deploy these models to create a large-scale recommender system.

Description

This developer pattern demonstrates the key elements of creating a recommender system by using Apache Spark and Elasticsearch. A Jupyter Notebook shows you how to use Spark for training a collaborative filtering recommendation model from ratings data stored in Elasticsearch, saving the model factors to Elasticsearch, then using Elasticsearch to serve real-time recommendations by using the model.

Upon completion, you’ll know how to:

  • Ingest and index user event data into Elasticsearch by using the Elasticsearch Spark connector.
  • Load event data into Spark DataFrames and use Spark’s machine learning library (MLlib) to train a collaborative filtering recommender model.
  • Export the trained model into Elasticsearch.
  • Using a custom Elasticsearch plugin, compute personalized user and similar item recommendations and combine recommendations with search and content filtering.

Flow

flow

  1. Load the movie dataset into Spark.
  2. Use Spark DataFrame operations to clean up the dataset and load it into Elasticsearch.
  3. Using Spark MLlib, train a collaborative filtering recommendation model.
  4. Save the resulting model into Elasticsearch.
  5. Using Elasticsearch queries and a custom vector scoring plugin, generate some example recommendations. The Movie Database API is used to display movie poster images for the recommended movie.
Rich Hagarty
Nick Pentreath