After visiting a trade show and seeing a succession of dull stands with only leaflets to hand out, Chris Snow and I came up with the idea of building a Cloudant cluster of Raspberry Pis to put the IBM Cloudant booth. Cloudant’s NoSQL Database-as-a-Service clusters are hidden away in the depths of data centres around the world belonging to SoftLayer, Rackspace, Microsoft and Amazon, so there is little tangible product to display at a conference stand; no software to install, no drivers required, no SQL! A working Cloudant cluster at the booth allows distributed databases to be seen in action with flashing lights indicating per-node activity.

I started by building CouchDB 2.0 on a single Raspberry Pi running Debian Wheezy to make sure it was feasible. It wasn’t as simple as typing “sudo apt-get install couchdb” – that only gets you CouchDB 1.2. I needed CouchDB 2.0, which includes the multi-node clustering technology that Cloudant developed and has been donated back to Apache CouchDB’s open-source community. The process for installing CouchDB 2.0 is a bit more involved and involves building the project from source after installing its dependencies. The single-node worked and so plans were made to build a 12-node cluster. Here is my original sketch:

Cloudant Raspberry Pi

The idea was to have 12 Pis arranged in a ring to mimic the logical arrangement of the full-size servers in a real production Cloudant cluster. Each machine was to be connected via wifi to a router at the rear and a load balancer (a 13th Pi) would direct traffic around the cluster.

  • The hardware was ordered and boxes were unpacked. The blank SD cards were burned with fresh operating system images.
Cloudant Raspberry Pi SD cards
  • The pis were mounted in plastic cases.
Cloudant Raspberry Pi plastic cases
  • LEDs were hand-soldered to resistors and GPIO connectors.
Cloudant Raspberry Pi plastic cases GPIO connectors

Then came the tricky bit: building and installing the software on 12 devices. The automation tool I chose to help with this was Ansible which allows the scripting of tasks in YML ‘playbooks’ which can be executed in parallel via SSH on multiple host machines. The playbooks I created are published in two Github repositories:

A key feature of the project was to make each machine’s LED flash whenever that node was performing an action. To do this I created a Node.js script called ‘flasher’ which pulses an LED on and off whenever a line of text arrives on stdin. This allows output from log files to be piped to ‘flasher’ very simply e.g.

 tail -f node1.log | flasher > /dev/null &

This brings me to head-scratching problem that caused me an hour or two of head scratching. It turns out that

 tail -f node1.log | grep 'FLASH'

happily produces output, but not when its output is piped to another process. i.e.

# this works - each line appearing in node1.log containing ‘FLASH’

# appears on stdout

tail -f node1.log | grep 'FLASH'
# this doesn’t work - the LED doesn’t flash - the ‘flasher’ script

# doesn’t see any input!

tail -f node1.log | grep 'FLASH' | flasher

Why not? You have to do:

tail -f node1.log | grep --line-buffered 'FLASH' | flasher > /dev/null &

otherwise nothing happens. Silly me.

I had to patch CouchDB’s “Fabric” Erlang code to ensure that log messages were created containing the word ‘FLASH’ whenever a node was asked to store or retrieve data e.g.

all_docs(DbName, Options, #mrargs{keys=undefined} = Args0) ->

couch_log:notice("FLASH all_docs", []),

The effect of this is that when a document is added into the distributed database, three machines’ LEDs flash simultaneously, indicating the three nodes dealing with the shard that the data resides in. By sharding the data, Cloudant can store more data than could be held on one machine and divides read, write and indexing load into smaller chunks.

When all of the database nodes were configured and a load-balancer running HAproxy was built, the cluster was up and running, shown here testing the flashing of LEDs:

Cloudant Raspberry Pi LEDs

After that, the devices were sent away to be turned into something worthy of displaying at a conference booth:

Cloudant Raspberry Pi Cloudant RP Cluster

So if you see Cloudant represented at a developer conference near you, stop by and say hello and I’ll show you how it all works. See how the cluster shares the workload around the cluster, how it keeps multiple copies of the same data and how it can survive node failures automatically. The cluster’s data can be replicated to other instances of CouchDB, to live Cloudant accounts or to mobile devices running PouchDB or Cloudant Sync for iOS or Android.

Tips For Those Reproducing This Project

  1. Buy SD cards with a pre-installed operating system image. Burning your own is very slow.
  2. Use “class 10” SD cards. It doesn’t make the Raspberry Pis any faster, but it does make dealing with images on your Mac/PC a good deal quicker.
  3. Automate everything. Ansible was invaluable for coordinating actions across all the nodes in parallel.
  4. Use a tagged release of CouchDB – if you build the “master” branch, then you will also get unstable “master” versions of its dependencies.
  5. Use the new Raspberry Pi 2 model – they are much quicker and cost the same as the older models.

Join The Discussion

Your email address will not be published. Required fields are marked *