We’ve recently released a beta of our new Message Hub service in Bluemix. It’s based on the open-source Apache Kafka project. So, why did we choose Kafka and why should you be interested?

We hear a lot at the moment about two-speed IT: steady and fast. The steady speed of IT refers to the traditional IT systems which have underpinned corporations for decades. Dependable, reliable, and with a very deliberate, measured approach to change. In contrast, the fast speed of IT refers to more modern, agile IT systems often used to build web and mobile applications. Loosely coupled services, rapidly developed and enhanced, “born in the cloud”. Message Hub is messaging for the fast speed of IT.

There are two key attributes of Kafka which made it a particularly good choice as the foundation of Message Hub: availability and scalability.

Kafka was created as a highly available, high-throughput, massively scalable, publish/subscribe messaging system. Kafka is deployed as a cluster of brokers. The brokers replicate message data between them, so there are multiple copies. The cluster can tolerate failure of a broker and continue processing messages without having to recover the failed broker. If the failed broker is the “leader” for a topic, the cluster “elects” a new leader from the remaining brokers with replicas of the topic and the messages continue to flow almost immediately.
 
There’s no need to recover the failed broker before messaging can continue, so availability is improved. Of course, when a failed broker is restarted, it has to catch up, but once it’s back in sync with the other brokers, the cluster is back up to full strength.

You arrange the cluster so that the brokers are in different failure domains, meaning that a single failure affects only one broker. The fact that messages are replicated and failed brokers can be tolerated provided that each topic has at least one up-to-date broker lets Kafka play a neat trick with regards to making sure messages are not lost.

Every message is written to disk, so they’re stored durably. However, they are written as efficiently as possible using the operating system’s cache. This is very much more efficient than eagerly writing them straight to disk. But doesn’t this mean that, following a power failure, a broker might lose messages not yet written to disk? Well, that broker may well have lost some data but the other replicas will be intact. The cluster as a whole hasn’t lost a thing, provided that every topic still has at least one broker remaining.

In Message Hub, the messages are kept for 24 hours whether they’ve been consumed or not meaning that you can confidently perform operations such as taking a consuming application offline for an upgrade without worrying about blowing a limit of unconsumed messages. They’re safely on disk and the application can pick up from where it left off when it starts again. You can even re-process messages that you’ve seen before if you like, if you keep track of the offsets of the messages you want to see again.

What about scalability? Kafka topics as I’ve just described already give very good throughput. If you search on the internet, you can find plenty of performance results which show how fast Kafka can go. The brokers concentrate on efficiently writing messages to disk, omitting many features of other messaging systems which would limit performance. But Kafka has another trick up its sleeve. If you reach the limit of what a topic can handle, you can add more partitions to the topic, multiplying the throughput you can achieve. It’s just a simple configuration change for the topic. Each partition has its own replicas and they are automatically spread across the cluster to give best performance. I’ve heard of one system in which a topic has 19 partitions, and that corresponds to millions of messages a second. We’ve yet to enable this feature for Message Hub, but it’s coming.

Taken together, this means you can achieve things impossible with most other messaging systems. For example, let’s say that you want to analyse the way that customers are using your web site such as how long particular pages are on screen. You could instrument the web site and stream the telemetry data to an analytics system. For a busy web site, that’s a lot of data and you certainly don’t want to do all of this processing in such a way that it impacts the responsiveness of your web site. So, take the web site data, publish it to Message Hub and then use that to feed the analytics system. If the volume of data is overwhelming, increase the number of partitions of the topic and scale up the analytics system. And because it’s publish/subscribe, you can even run more than one analytics system in parallel if you like.

We’re still in the early days of Message Hub, but we’re introducing improvements all of the time. (To see what new features we’re experimenting with, take a look at the Message Hub Incubator.)

For a blog that summarizes the benefits of having Apache Kafka as a service, read Message Hub: Apache Kafka as a Service.

3 comments on"IBM Message Hub brings Apache Kafka to Bluemix"

  1. […] Kafka 0.9 was released which is now available as Messaging Hub (beta) service in Bluemix. For developers there are different APIs available and […]

  2. […] Those engineers left LinkedIn to launch Confluent which provides a commercial version of Kafka. And Kafka just got a big vote of support from IBM, which added support for Kafka to IBM's app-hosting cloud, BlueMix. […]

Join The Discussion

Your email address will not be published.