Building a web application in 2015 is easy and cheap. With a single affordable virtual server, you can host your entire web stack:

  • Web server. Nginx or Apache Httpd
  • Static assets. JavaScript, CSS, images, and media files stored on the server’s disk
  • Application logic. Your PHP/Ruby/Node.js/etc code
  • Database storage. Your application’s dynamic data in a MySQL/PostgreSQL/NoSQL database
  • Dynamic assets. Uploaded images, videos, or user-generated content
  • Logs. The engineering logs showing how your app is working

A single-server instance is fine for low-volume websites that receive small amounts of traffic or generate little data. But successful applications grow VERY quickly; distribution using app stores and viral social media coverage can bombard a little-known app with eye-watering volumes of traffic in no time.

With success, comes the problem of scaling your application. As more users arrive and the volume of data grows, it becomes obvious that your stack needs to be reorganised and expand horizontally, to add extra capacity and to introduce fault-tolerance:

  • What if a web server fails? You should really have three of those with a load balancer in front of them. To make sure, you should have a pair of load-balancers in an HA configuration.
  • Your web server shouldn’t have to be overloaded by serving static assets. Move the assets to a Content Delivery Network (CDN) where they are also distributed around the globe to be as close as possible to your users.
  • You can’t have your database on the same server as your web server. For a start, there’s now three web servers! Move it to a separate database server, a cluster of database servers, or even a cloud-hosted solution.
  • Your users’ uploaded content has grown out of all proportion. You need a scalable file storage service. This is where Object storage comes in; push uploaded content into Object storage which scales massively.
  • A successful application generates tons of logs. Logs recording user requests, application status, error conditions etc. With a move from one server to many servers, there is a need to see all your application’s logs in one time-ordered stream so that the entire story can be viewed in sequence.

Making those changes moves our application stack from this:

single server architecture

to this:

multi-server architecture

The new setup may be a more expensive and complicated stack, but it can handle more users, is more reliable, and can be expanded to deal with extra load, like the Christmas rush. Each component in the stack can be scaled individually to suit its workload.

But a successful application has other growing pains that are not immediately obvious.

Asynchronous workloads – the signup conundrum

Imagine your application has a sign-up path. A user enters their name, email, a photo and Twitter account. The app must:

  • resize the photo into several thumbnail sizes and upload the pictures to our Object store
  • access the Twitter profile to pull the user’s biography and follower list, storing the information in a database
  • send an email to the user to verify their email account (the user clicks on the link and then we know they are real)

All of the above actions take some time, some computing power, some network bandwidth, and some API calls to external services. The tasks may take different amounts of time, and it would be logical to enable each task to scale individually. In our single-server architecture, the web server itself would have resized the images, uploaded to Object storage, made calls to Twitter, and sent the email. But to be truly scalable, we need to dedicate our web servers to deal with any incoming HTTP requests and reply as soon as they can, to give the user a snappy user experience.

What is required here is a queue, or a number of queues; one for each task.

three queues fed from another queue

When a registration request arrives at the web server, it puts an item into a queue called registration_queue. A worker process listens for items arriving on the registration_queue and performs one job: for each item received on the registration_queue, add an item to each of 3 task-specific queues: the resize_photo, scrape_twitter, and verify_email queues (in a minute, you’ll see why we do it in this convoluted way). Worker processes monitoring each of these queues pick up each item of work, in turn, and perform the actual work. Because many worker servers can be assigned to the queues, we can provision as many as it takes to process the work at the required rate! If processing the photos is computationally more expensive, we can assign beefier servers to the resize_photo queue than to the other queues. If, at a later date, we wish to add a fourth step into the registration process, we need only create another queue, add workers to it, and then modify the registration_queue worker to add an item to this queue for each registrant.

Furthermore, when someone clicks on the link in their verification email, the request arrives at a web server which adds an item to the registration_success queue. A process listening to that queue sends another email to the user to inform them that they have registered successfully.

Offline workloads – the stats dilemma

As your successful startup grows, more people will be employed to analyse and report on the success of the business.


To answer these questions, we would have to collect data from multiple sources, but the organisation of our queues makes the data easy to access. Web servers add items to the registration_queue and verified users are added to the registration_success queue.

Although worker processes acting upon these queued tasks are already consuming the data, some queue systems allow multiple consumers of the data to attach to the same queue. This lets the worker processes continue using the stream of tasks as a queue, while other consumer processes turn the data into daily aggregates and feed real-time dashboards.

In this way, the data queue scales the workload to meet demand, and can also feed multiple reporting streams to the teams that need to know what’s happening in the application.

Complex systems breed data

As the complexity of the system increases, fault-finding becomes more difficult. Tracking issues requires yet more recorded data (“Instrument Everything!”) to let people dive into all aspects of system activity. So, in addition to the web server logs, the queue activity, and the logs of each of the queue workers, we also have detailed logs from the database layer, the application logic, as well as any server or operating system-level logging.

This data may be kept forever in another long-term data store. But for the short term, it makes sense to stream this data into a queue where it can be consumed by zero or more consumers. Example consumers could be:

  • saving the log data in Apache Hadoop for historical analysis
  • streaming to Apache Spark for real-time reporting, paging the Ops team with any faults found during the analysis
  • feeding a real-time dashboard app

Once again the producer of the data knows nothing about the consumers of the data, and vice versa. There can be multiple producers and multiple consumers.

IBM Message Hub and RabbitMQ

In a large, multi-faceted IT system, the queue or Message Hub becomes the heart of the system. It receives and buffers data from many producers and delivers streams of data to connected consumers. The messaging hub needs to be scalable, fault-tolerant, and performant as it brokers every piece of data generated or consumed by our system.

message hub at the heart of your app

Here are two options you could use to fulfill this role:

  • Message Hub

    IBM Message Hub is based on Apache Kafka and is run as-a-service. It stores streams of data in topics, bufferring the data for a while (usually a few days) before the oldest data is discarded. Consumers can ask for data in chronological order, from any point in the stream (within the last few days). Because the way in which data can be consumed is limited, Message Hub is optimised for time-ordered reads and can handle vast amounts of throughput (millions of messages per second and terrabytes of data).

    Like any good queue or broker, the consumers know nothing about the producers and vice versa. The only thing to consider in your design is how many consumers you wish to have consuming a topic in parallel, if they only consume the data like a queue. Topics can by sharded into N partitions—in this scenario, you should deploy N consumer processes consuming one shard of the topic each. This will prevent each worker from treading on other worker’s toes and spreads the workload around the Kafka cluster.

    Apache Kafka was first developed by LinkedIn to deal with the billions of messages their organisation was dealing with each day. It is open-source software and can be downloaded and installed for free but IBM’s Message Hub gives you a simple way of working with Kafka without installation, server provisioning, or maintenance.

    Third-party libraries are available in many programming languages, so it’s easy to get started in the language of your choice.

    If you’re a Java developer and want to get started with Kafka then this blog post has step-by-step instructions.

  • RabbitMQ

    RabbitMQ is a messaging application designed to behave like a queue or pub-sub hub for your application’s messaging. Our friends at recently launched RabbitMQ as another service you can spin up on the Compose platform. You can create a fault-tolerant, multi-node RabbitMQ cluster in a couple of clicks.

    RabbitMQ has been around for years and is widely supported by a range of programming languages.

    Follow this getting started guide to create your own, private message broker cluster within a few minutes!

Join The Discussion

Your email address will not be published. Required fields are marked *