In this developer code pattern, we will be streaming online shopping data and using the data to track the products that each customer has added to the cart. We will build a k-means clustering model with scikit-learn to group customers according to the contents of their shopping carts. The cluster assignment can be used to predict additional products to recommend.
Our application will be built using IBM Streams on IBM Cloud Pak® for Data. IBM Streams provides a built-in IDE, called Streams Flows, that allows you to visually create a streaming app. The IBM Cloud Pak for Data platform provides additional support, such as integration with multiple data sources, built-in analytics, Jupyter Notebooks, and machine learning.
To build and deploy our machine learning model, we will use a Jupyter Notebook in IBM Watson® Studio and a Watson Machine Learning instance. In our examples, both are running on IBM Cloud Pak for Data.
Using the Streams Flows editor, we will create a streaming app with the following operators:
- A Source operator that generates sample clickstream data
- A Filter operator that keeps only the “add to cart” events
- A Code operator where we use Python code to arrange the shopping cart items into an input array for scoring
- A WML Deployment operator to assign the customer to a cluster
- A Debug operator to demonstrate the results
- User builds and deploys a machine learning model.
- User creates and runs an IBM Streams application.
- The Streams Flow UI shows streaming, filtering, and scoring in action.
Ready to get started? The README explains the steps to:
- Verify access to your IBM Streams instance on Cloud Pak for Data.
- Create a new project in Cloud Pak for Data.
- Build and store a model.
- Associate the deployment space with the project.
- Deploy the model.
- Create and run a Streams Flow application.