In this code pattern, two Jupyter Notebooks are provided to ingest and analyze clickstream data. The code pattern and the notebooks are based on a 3-part blog series Ingest and analyze streaming event data at scale with IBM Db2 EventStore. If you’d like to try out the code, just follow the code pattern. Here’s what you’ll do:

  • Try Jupyter Notebooks using Scala
  • Interact with Event Store using Scala and Spark SQL
  • Create compelling visualizations with Brunel

charts

The clickstream use case is popular one. Tracking website usage is important to provide better user experience, targeted offerings, and customer support. As a developer, however, I tend to look at code patterns as an introduction, example, and reference for using new developer tools. Let’s look at some of that good stuff…

Event Store

We used IBM Db2 Event Store to hold the data. If you follow the README and walk through the code pattern, you’ll get a chance to install the developer version and try it out. Event Store is ideal for event-driven solutions that need to scale and support analysis on both historical and live events. In the code pattern, we don’t use a lot of data — it’s an easy example. And you’ll see how easy it is to interact with Event Store with a simple API and with Spark SQL. Did I mention Scala? Oh yeah, this code pattern uses Scala, but the API is there for other languages. If you’re interested in a Python and Java example, check out this code pattern too!

With the Event Store install, you also get a Jupyter Notebook environment with support for Scala notebooks. We’ll use that to run the notebooks.

Scala in a Jupyter Notebook!

I know! The “py” in Jupyter stands for Python, but Scala looks like a pretty good fit too. With the Apache Toree kernel, you can access the Event Store data using Event Store APIs and Spark SQL APIs. In this example, you won’t find classes or functions. So, it is not a very comprehensive intro to Scala, but running Scala cell-by-cell in a notebook — mixing markdown instructions, code, and output — shows what’s great about Jupyter Notebooks. You can explain and code. You can tweak code and re-run a cell, regenerate a graph, and experiment all you want until you have reached your conclusion.

Spark SQL

Event Store is built on Apache Spark. So, you’ll find plenty of Spark SQL and DataFrames in “the analyze notebook.” We’ll take the data and manipulate it and aggregate it. For example, we’ll use aggregation to see the data for product lines, products, and features. We’ll do some calculation to get time spent on web pages and augment the data to show day-of-week. Ultimately the DataFrames are prepared and ready to feed the charts. If you work with Jupyter Notebooks in any language, you should get familiar with DataFrames and charts. Let’s look at the charts…

Brunel Visualization

In “the analyze notebook,” we try to bring compelling visualization to give you insight at a glance with an interactive element that lets you explore.

When you get the hang of it, you’ll find that Brunel is powerful and succinct language. The examples show quite a few features in very few lines of code:

  • Setting the chart position to create “4-up” presentation (you can overlay them too)
  • Adding titles
  • Adding legends
  • Customizing tool-tips
  • Wiring the selection on one chart to highlight that data on another chart
  • A variety of chart types
  • And more…

If you play around with it, you’ll see this is just the beginning of what you can do with Brunel.

Give it a try

If you are a developer, please give it a try. I think you’ll find a bunch of useful tools here.

Don’t get into code that much? Well that’s a bummer, but the code pattern includes an example of the output, and I think you can give it a quick read (and check out that 3-part blog) and take a look at the example output. You should get a pretty good idea of how clickstream data or a similar use case can be visualized to create the solutions you need.

Join The Discussion

Your email address will not be published. Required fields are marked *