Kubernetes with OpenShift World Tour: Get hands-on experience and build applications fast! Find a workshop!

Build an interactive product recommender with Spark and PixieDust

Most websites selling products online show you a list of items that you might be interested in. The better the recommendations the more likely that you will buy any of these, which will increase their sales. But how are these recommendations created?

The most straightforward way is to use the purchase data of all customers. From this data, you can create groups of customers that have bought similar products. A statistical method to do this is called clustering where you create groups in which the customers in each group are more similar to each other than the customers in other groups. One of the algorithms you might use is called k-means clustering where each customer will be within a cluster with the nearest mean. In the following example, you can see how many products each customer has bought of product A and B.

product a diagram

The dots are customers and they can be clustered into n groups. You can define as many groups as you need. The following example shows what this might look like. Note this is just a sketch, so a real k-means algorithm will probably calculate something different.

product b diagram

With a machine learning algorithm you can do the same with many more products, where each additional product will add an extra dimension to the previous example. In this code pattern, the k-means algorithm from Spark ML is used.

We are not there yet. After clustering all of the customers in groups, we still don’t have a list of products to recommend. A simple way to create the list of recommended products is to order the most bought products in a cluster and then recommend these. With, of course, taking out the products that are already in the customer’s basket.

One of my favorite tools to clean the data and build a model is a Jupyter Notebook where you can easily run code, add comments, and explore data with charts and tables. You can run Notebooks in Watson Studio where you can use a Spark kernel.

After building the model, you probably want to show or share it with others. If you want to use your model in a web application to recommend products, you can use Watson Machine Learning. Directly from the Notebook you can deploy the model as an API, which you can then use from anywhere.

But before using this API in an application it’s a good idea to test the model. You can do this in a Notebook by running code. But when you want to let others understand what you have built, a PixieApp is a tool that you can use to show how the recommendation engine works much more clearly. The following example shows an interactive PixieApp of a shopping basket where you can add and delete products and then create a list of recommendations based in the contents of the basket.

example

If you want to learn more, this code pattern shows you all of the code that you need to build a recommender engine from customer data in a Jupyter Notebook. You will learn how to use Spark to build a k-means model, deploy this model to Watson Machine Learning, and then use this model through an API to build an interactive shopping cart as a PixieApp.

Margriet Groenendijk