Event-driven analytics requires a data management system that can scale to allow a high rate of incoming events while optimizing to allow immediate analytics. IBM Db2 Event Store extends Apache Spark to provide accelerated queries and lightning fast inserts. This code pattern is a simple introduction to get you started with event-driven analytics. You can...
Clickstream analysis is the process of collecting, analyzing, and reporting about which web pages a user visits, and can offer useful information about the usage characteristics of a website. In this code pattern, we will utilize clickstream analysis to demonstrate how to detect real-time trending topics on the Wikipedia web site.
Clickstream analysis is a is the process of collecting, analyzing, and reporting about which web pages a user visits, and can offer useful information about the usage characteristics of a website.
Some popular use cases for clickstream analysis include:
- A/B testing – Statistically study how users of a website are affected by changes from version A to B.
- Recommendation generation on shopping portals – Click patterns of users of a shopping portal website indicate how a user was influenced into buying something. This information can be used as a recommendation generation for future such patterns of clicks.
- Targeted advertisement – Similar to recommendation generation, but tracking user clicks across websites and using that information to target advertisement in real time and more accurately.
- Trending topics – Clickstream analysis can be used to study or report trending topics in real time. For a particular time quantum, display top items that get the highest number of user clicks.
In this code pattern, we will demonstrate how to detect real-time trending topics on Wikipedia. To perform this task, Apache Kafka will be used as a message queue, and the Apache Spark structured streaming engine will be used to perform the analytics. This combination is well known for its usability, high throughput, and low-latency characteristics.
When you complete this pattern, you will understand how to:
- User connects with Apache Kafka service and sets up a running instance of a clickstream.
- Run a Jupyter Notebook in IBM Data Science Experience that interacts with the underlying Apache Spark service. Alternatively, this can be done locally by running the Spark Shell.
- The Apache Spark service reads and processes data from the Apache Kafka service.
- Processed Kafka data is relayed back to the user via the Jupyter Notebook (or console sink if running locally).