Score streaming data with a machine learning model

This is part of the Learning path: Get started with IBM Streams.

Level Topic Type
100 Introduction to IBM Streams Article
101 Create your first IBM Streams app without writing code Tutorial
201 Ingest data from Apache Kafka Code pattern
301 Build a streaming app using a Python API Code pattern
302 Access streaming data with REST services Tutorial
401 Score streaming data with a machine learning model Code pattern

Summary

In this developer code pattern, we will be streaming online shopping data and using the data to track the products that each customer has added to the cart. We will build a k-means clustering model with scikit-learn to group customers according to the contents of their shopping carts. The cluster assignment can be used to predict additional products to recommend.

Description

Our application will be built using IBM Streams on IBM Cloud Pak® for Data. IBM Streams provides a built-in IDE, called Streams Flows, that allows you to visually create a streaming app. The IBM Cloud Pak for Data platform provides additional support, such as integration with multiple data sources, built-in analytics, Jupyter Notebooks, and machine learning.

To build and deploy our machine learning model, we will use a Jupyter Notebook in IBM Watson® Studio and a Watson Machine Learning instance. In our examples, both are running on IBM Cloud Pak for Data.

Using the Streams Flows editor, we will create a streaming app with the following operators:

  • A Source operator that generates sample clickstream data
  • A Filter operator that keeps only the “add to cart” events
  • A Code operator where we use Python code to arrange the shopping cart items into an input array for scoring
  • A WML Deployment operator to assign the customer to a cluster
  • A Debug operator to demonstrate the results

Flow

flow

  1. User builds and deploys a machine learning model.
  2. User creates and runs an IBM Streams application.
  3. The Streams Flow UI shows streaming, filtering, and scoring in action.

Instructions

Ready to get started? The README explains the steps to:

  1. Verify access to your IBM Streams instance on Cloud Pak for Data.
  2. Create a new project in Cloud Pak for Data.
  3. Build and store a model.
  4. Associate the deployment space with the project.
  5. Deploy the model.
  6. Create and run a Streams Flow application.

Congratulations! This code pattern wraps up the Get started with IBM Streams series. In addition to explaining IBM Streams, we’ve shown how to:

  • Create your first IBM Streams app without writing code
  • Build an Apache Kafka streaming app
  • Build a streaming app using a Python API
  • Score streaming data with a machine learning model

You should now have a fundamental understanding of IBM Streams and some of its features. If you want to learn more, take a look at the Introduction to streaming analytics with IBM Streams video series.