This post is part of a series of posts created by the two newest members of our Developer Advocate team here at IBM Cloud Data Services. In honour of the book Seven Databases in Seven Weeks by Eric Redmond and Jim R. Wilson, we challenged Lorna and Matt to take a new database from our portfolio every day, get it set up and working, and write a blog post about their experiences. Each post reflects the story of their day with a new database. We’ll update our seven-days GitHub repo with example code as the series progresses. —The Editors

RethinkDB logo
Meet RethinkDB and its mascot, The Thinker.
  • Database type: highly scalable JSON storage with real-time data feeds
  • Best tool for: situations where it’s important to quickly update when data changes

Overview

RethinkDB is a database that aims to provide a performant and scalable storage solution that pleases both development and operations people. Inside, it’s a document database using JSON format, is distributed by nature, and includes a user-friendly admin console for managing it. So far, nothing particularly special but RethinkDB has a couple of tricks up its sleeve: unusually for document databases it supports joins, and it also allows you to retain a connection to a query so if any further results arrive they will instantly be pushed to the client over the same connection.

RethinkDB is open source, so you can run this anywhere, although for these examples we’ll make use of the cloud and provision a RethinkDB instance on Bluemix. This article covers how to get started setting up RethinkDB and connecting your application to it. We’ve put together a quick example using a hypothetical issue tracker and paying particular attention to the data feed updates that RethinkDB offers.

Getting Set Up

To begin with, log into your Bluemix account and click on “Catalog”. Under “Data and Analytics” you should find as one of the available choices “Compose for RethinkDB”; click on this and provision the database to your account. It takes a few moments to spin up and then you should find yourself at the dashboard for your RethinkDB.

The first step will be to create the credentials we’ll need to use during the rest of the tutorial.

Create new credentials for RethinkDB

Click on the “New Credential” button to add new access credentials to your RethinkDB database.

Connecting from Node.js

There are official libraries for Node.js, Python and Ruby, and there are many more community-contributed offerings that seem to work well. So most applications will be able to easily take advantage of RethinkDB’s features.

On your new credentials entry, click “View Credentials” to unfold the JSON-formatted details that you’ll need to access RethinkDB.

  • Authentication Credentials: Look at the uri field in the credentials block: we’ll need the host, port number, username and password from this string.
  • Certificate: this is provided in a base64 encoded format for safe transfer, so we’ll need to decode it and store the result into a file that the application can use. Ours is in a file called cert and you’ll see it referenced in our application shortly.

For this example, we used Node.js, and put all the initial configuration and setup into a file named config.js, which we included in all our other scripts (example code on GitHub). Here’s that file:

const fs = require('fs');
const cert = new Buffer(fs.readFileSync('./cert', "utf8"));

const connection = {
  host: "bluemix-sandbox-dal-9-portal.4.dblayer.com",
  port: 20623,
  user: "admin",
  password: "SAHPgKzuOeFj7qu8ZaXCDjPNz4LPrCpfWEyquasjrA",
  ssl: {
    ca: cert
  }
}

module.exports = {
  connection: connection
}

Now we can test the connection by attempting to create a database — if we can successfully do this, then we know everything is working well. Here’s our create_db.js:

const config = require('./config.js');

// RethinkDB Driver
const r = require('rethinkdb');
// connect to the DB
r.connect(config.connection, function(err, conn) {
  if(err) throw err;

  // create our DB
  r.dbCreate('issues').run(conn, function(err, data) {
    if (err) throw err;

    console.log("DB created", data);
    conn.close();
  });
});

Pro-tip: Remember to include the self-signed SSL cert that was available from the credentials screen. If you don’t yet have Node.js, we recommend Homebrew for OS X. Then just brew install node. Treehouse also has some nice instructions.

This code simply includes the config file we created earlier, creates a connection to the database, and outputs a log message if it is successful. At this point, we can start to use this connection to perform other operations.

Design Your Database

As an example, we’ll consider a simple sort of bug tracker application, just allowing us to add issues and keep track of their status and so on. First, we’ll create a table to store the issues.

RethinkDB has a nice, easy web interface which you can use to create tables. You may also want to do that programmatically, so let’s start by looking at the code we used to create the issues table. Here’s our create_table.js:

const config = require('./config.js');

// RethinkDB Driver
const r = require('rethinkdb');
// connect to the DB
r.connect(config.connection, function(err, conn) {
  if(err) throw err;

  // create our table
  r.db('issues').tableCreate("issues").run(conn, function(err, data) {
    if (err) throw err;

    console.log("Table created", data);
    conn.close();
  });
});

Check in the admin interface to see your new database listed and verify that everything worked as expected. You should see your new table (but it’s still empty).

Importing Data

Since RethinkDB is JSON-based, it’s pretty happy to ingest JSON data of any kind, which is nice! There’s some detailed documentation on importing data, but we generated some sample data using http://json-generator.com and simply used that to quickly give ourselves something to work with.

Importing data from our application is quite simple. Here’s a snippet from our application, with the data to import saved into a file named seed_data.json in the same directory. Here’s create_data.js:

const config = require('./config.js');

// RethinkDB Driver
const r = require('rethinkdb');
// seed data
const seed = require('./seed_data.json');
// connect to the DB
r.connect(config.connection, function(err, conn) {
  if(err) throw err;

  // Seed our table with some data
  r.db('issues').table("issues").insert(seed).run(conn, function(err, data) {
    if (err) throw err;

    console.log("Seed data added", data);
    conn.close();
  });
});

This is a great way to get started quickly with some data in the issues table, and it means we can move along to the fun parts: querying the data and then seeing later changes also arrive instantly.

Fetching Data and Receiving Updates

RethinkDB has its own query language called ReQL (for the very quickest of starts, there’s even an SQL to ReQL cheatsheet). Let’s look at a very simple query. It fetches all records from our issues table, but here’s where it gets interesting: this script will then remain connected, and output further records when new data appears.

First the code that queries the database and outputs information for each issue (fetch_all_data.js):

const config = require('./config.js');

// RethinkDB Driver
const r = require('rethinkdb');
// seed data
const seed = require('./seed_data.json');
// helper function to format the output
const format = require('./format_issue.js').output;
// async
const async = require('async');
// connect to the DB
r.connect(config.connection, function(err, conn) {
  if(err) throw err;

  var actions = {
    current: function(callback) {
      // Get every issue
      r.db('issues').table("issues").run(conn, function(err, cursor) {
        if (err) throw err;

        cursor.toArray(function(err, data) {
          format(data);
          return callback();
        })
      });
    }
  }

  async.series(actions, function() {
    // Get every new issue
    r.db('issues').table("issues").changes().run(conn, function(err, cursor) {
      if (err) throw err;

      cursor.each(function(err, data) {
        format(data.new_val);
      })
    });
  });
});

Take a look at the output of this script (this is just the last few lines):

ebd578b5-fde3-4318-bb9e-e2aaf7b43b21
ut anim sunt voluptate ex reprehenderit
STATUS: closed
================================
b22d4484-2a00-472d-b5c1-20af894ed056
est sint labore tempor veniam sit
STATUS: wontfix
================================
ef344e27-e809-44cd-8395-1c93490c546e
in officia Lorem in pariatur labore
STATUS: reopened

We can leave this running in the terminal and from another window, use a script that just inserts one new row that would appear in our dataset. Below is a quick script to do that; it cheats and steals an existing row of data and repurposes it. And now, we give you create_new_row.js:

const config = require('./config.js');

// RethinkDB Driver
const r = require('rethinkdb');
// some helper modules
const _ = require('underscore');
const argv = require('optimist').argv;
// seed data
const row = _.shuffle(require('./seed_data.json'))[0];
// connect to the DB
r.connect(config.connection, function(err, conn) {
  if(err) throw err;

  // Seed our table with some data
  r.db('issues').table("issues").insert(row).run(conn, function(err, data) {
    if (err) throw err;

    console.log(row);
    conn.close();
  });
});

With the new row in place, take a look at what’s going on in the output of our original fetch-all-the-data script:

ebd578b5-fde3-4318-bb9e-e2aaf7b43b21
ut anim sunt voluptate ex reprehenderit
STATUS: closed
================================
b22d4484-2a00-472d-b5c1-20af894ed056
est sint labore tempor veniam sit
STATUS: wontfix
================================
ef344e27-e809-44cd-8395-1c93490c546e
in officia Lorem in pariatur labore
STATUS: reopened


================================
3fc73e89-8da1-4bce-91a3-31ae897ab7b6
Lorem nisi proident ea commodo nulla
STATUS: reopened

Conclusion

This ability to keep queries running and instantly ship updates when the data changes is a key feature of RethinkDB. It makes this tool a great choice for anything which needs to update in response to data, either changing prices on a ticker or notifying other users of a web-based tool that someone else made changes. RethinkDB can be used by any number of server-side languages and is available whether you want to run it on your own hardware or deploy it as-a-service.

Join The Discussion

Your email address will not be published. Required fields are marked *