2021 Call for Code Awards: Live from New York, with SNL’s Colin Jost! Learn more

Build a recommender with Apache Spark and Elasticsearch


Recommendation engines are among the most well known, widely used and highest-value use cases for applying machine learning. Despite this, while there are many resources available for the basics of training a recommendation model, there are relatively few that explain how to actually deploy these models to create a large-scale recommender system.


This developer pattern demonstrates the key elements of creating a recommender system by using Apache Spark and Elasticsearch. A Jupyter Notebook shows you how to use Spark for training a collaborative filtering recommendation model from ratings data stored in Elasticsearch, saving the model factors to Elasticsearch, then using Elasticsearch to serve real-time recommendations by using the model.

Upon completion, you’ll know how to:

  • Ingest and index user event data into Elasticsearch by using the Elasticsearch Spark connector.
  • Load event data into Spark DataFrames and use Spark’s machine learning library (MLlib) to train a collaborative filtering recommender model.
  • Export the trained model into Elasticsearch.
  • Using a script score query in Elasticsearch, compute similar item and personalized user recommendations and combine recommendations with search and content filtering.



  1. Load the movie dataset into Spark.
  2. Use Spark DataFrame operations to clean up the dataset and load it into Elasticsearch.
  3. Using Spark MLlib, train a collaborative filtering recommendation model.
  4. Save the resulting model into Elasticsearch.
  5. Using Elasticsearch script score queries and vector scoring functions, generate some example recommendations. The Movie Database API is used to display movie poster images for the recommended movie.