In this developer code pattern, we walk you through the basics of creating a streaming application powered by Apache Kafka, one of the most popular open source distributed event-streaming platforms used for creating real-time data pipeline and streaming apps. The application will be built using IBM Streams on IBM Cloud Pak® for Data.
In this pattern, we walk you through the basics of creating a streaming application powered by Apache Kafka. Our app will be built using IBM Streams on IBM Cloud Pak for Data. IBM Streams provides a built-in IDE (Streams Flows) that allows you to visually create a streaming app. The IBM Cloud Pak for Data platform provides additional support, such as integration with multiple data sources, built-in analytics, Jupyter Notebooks, and machine learning.
For our Apache Kafka service, we will be using IBM Event Streams on IBM Cloud, which is a high-throughput message bus built on the Kafka platform. In the following examples, we will show it as both a source and a target of clickstream data — data captured from user clicks as they browsed online shopping websites.
- User creates streaming app in IBM Streams.
- Streaming app uses Kafka service via IBM Event Streams to send/recieve messages.
- Jupyter notebook is generated from IBM Streams app.
- User executes streaming app in Jupyter notebook.
- Jupyter notebook accesses Kafka service via IBM Event Streams to send/receive messages.
Ready to get started? The README explains the steps to:
- Clone the repo
- Provison Event Streams on IBM Cloud
- Create sample Kafka console Python app
- Add IBM Streams service to Cloud Pak for Data
- Create a new project in Cloud Pak for Data
- Create a Streams Flow in Cloud Pak for Data
- Create a Streams Flow with Kafka as source
- Use Streams Flow option to generate a notebook
- Run the generated Streams Flow notebook