Ingest data from Apache Kafka

This is part of the Learning path: Get started with IBM Streams.

Level Topic Type
100 Introduction to IBM Streams Article
101 Create your first IBM Streams app without writing code Tutorial
201 Ingest data from Apache Kafka Code pattern
301 Build a streaming app using a Python API Code pattern
401 Score streaming data with a machine learning model Code pattern

Summary

In this developer code pattern, we walk you through the basics of creating a streaming application powered by Apache Kafka, one of the most popular open source distributed event-streaming platforms used for creating real-time data pipeline and streaming apps. The application will be built using IBM Streams on IBM Cloud Pak® for Data.

Description

In this pattern, we walk you through the basics of creating a streaming application powered by Apache Kafka. Our app will be built using IBM Streams on IBM Cloud Pak for Data. IBM Streams provides a built-in IDE (Streams Flows) that allows you to visually create a streaming app. The IBM Cloud Pak for Data platform provides additional support, such as integration with multiple data sources, built-in analytics, Jupyter Notebooks, and machine learning.

For our Apache Kafka service, we will be using IBM Event Streams on IBM Cloud, which is a high-throughput message bus built on the Kafka platform. In the following examples, we will show it as both a source and a target of clickstream data — data captured from user clicks as they browsed online shopping websites.

Flow

flow

  1. User creates streaming app in IBM Streams.
  2. Streaming app uses Kafka service via IBM Event Streams to send/recieve messages.
  3. Jupyter notebook is generated from IBM Streams app.
  4. User executes streaming app in Jupyter notebook.
  5. Jupyter notebook accesses Kafka service via IBM Event Streams to send/receive messages.

Instructions

Ready to get started? The README explains the steps to:

  1. Clone the repo
  2. Provison Event Streams on IBM Cloud
  3. Create sample Kafka console Python app
  4. Add IBM Streams service to Cloud Pak for Data
  5. Create a new project in Cloud Pak for Data
  6. Create a Streams Flow in Cloud Pak for Data
  7. Create a Streams Flow with Kafka as source
  8. Use Streams Flow option to generate a notebook
  9. Run the generated Streams Flow notebook

This pattern is part of the Learning path: Get started with IBM Streams. To continue the series and learn more about IBM Streams, check out a code pattern titled Build a streaming app using a Python API.