Build a recommendation engine using Apache Spark and Elasticsearch
Learn how to use Apache Spark and new vector scoring functions in Elasticsearch to build and deploy recommender models.
Recommendation engines are among the most well-known, widely used, and highest-value use cases for applying machine learning. Despite this, while there are many resources available for the basics of training a recommendation model, there are relatively few that explain how to actually deploy these models to create a large-scale recommender system.
The IBM Developer code pattern Build a recommender with Apache Spark and Elasticsearch illustrates how to build and deploy just such a recommender system.
Native vector scoring in Elasticsearch
While we won’t go into the details here (for that, check out the code pattern and the many resources available in the related GitHib repository), at the core of using Elasticsearch for this type of recommendation model is the ability to compute certain functions for numeric vectors. Given a “query” vector, you must compute a function between the query vector and the target vector in each document in the Elasticsearch index. The result of this function is then used to rank (or score) the documents just like a normal Elasticsearch query. These vector functions are one of the key ingredients behind the computation of recommendations such as related content (or “people who like this also liked …”) and personalized user recommendations (such as “recommended for you”).
By using these vector scoring functions within Elasticsearch, you get all of the power of a machine learning recommendation model, combined with the search and filtering functions of a search engine, all in one system.
The code pattern was originally published in 2017. At that time, the only way to compute the necessary functions for vectors was through a custom plug-in extension for Elasticsearch, as well as a somewhat clumsy encoding hack to efficiently index the vectors. With the release of Elasticsearch 7.0, dense vectors were added as a supported field type. Then, from version 7.3, these fields could be used in document scoring through vector functions. For another example in the domain of semantic search, see this blog post.
After a few years, document ranking with vector functions is now natively supported in Elasticsearch, doing away with the need for a custom plug-in!
Updated code pattern
To take advantage of this exciting new function, we’ve updated the code pattern to reflect the latest version of Elasticsearch. Most of the code pattern is the same, just even simpler to use because we can remove a step from the setup instructions. Perhaps more importantly, this makes it significantly easier to use this approach in a real application – many Elasticsearch production environments are locked down with respect to plug-in extensions, making it very difficult to use the vector scoring plug-in. Now, all you need is an “out-the-box” installation of Elasticsearch.