This post is part of a series of posts created by the two newest members of our Developer Advocate team here at IBM Cloud Data Services. In honour of the book Seven Databases in Seven Weeks by Eric Redmond and Jim R. Wilson, we challenged Lorna Mitchell and Matt Collins to take a new database from our portfolio every day, get it set up and working, and write a blog post about their experiences. Each post reflects the story of their day with a new database. We’ll update our seven-days GitHub repo with example code as the series progresses. —The Editors

IBM Cloudant logo
Your friendly neighborhood JSON store based on Apache CouchDB™.
  • Database type: Highly scalable, distributed and schema less JSON storage with full-text searching.
  • Best tool for: Building data intensive apps that need to scale with high availability and reliability.

Overview

Cloudant is the IBM Cloud version of CouchDB. It offers everything that you’d expect from a managed CouchDB service as well as adding a few additional features specific to this platform. You can review these new features now with the recent release of CouchDB 2.0.

At its heart Cloudant is a JSON document database that allows you to perform all of the familiar CRUD procedures as well as use MapReduce to build query-able views of your data. If you need to do something a bit more taxing, the added extras that you get with Cloudant such as Full Text search, Geospatial indexing and Cloudant Query should allow you to do just that.

Although CouchDB is open source, Cloudant is a hosted DBaaS with fully managed single or multi-tenant offerings – including a free tier available via the IBM Bluemix platform.

This article covers how to get started with Cloudant and uses as an example a simple student grade tracker containing information about students, their courses, and their grade for that course.

Set Up Cloudant

If you don’t have one already, you’ll want to sign up for an account on IBM Bluemix. There, use the Catalog to add a Cloudant NoSQL DB as a service in your account, then once it’s created, click the “Launch” button on the right hand side. To create the new database, click the “Create Database” option in the top right of the screen and create your database name – we’ve called ours “students”.

Create Database
Create Database

This database is where we will store all of our student data.

Cloudant Speaks HTTP

Cloudant is CouchDB at its core, and it is accessed via an HTTP REST API. Everything you need to do with Cloudant can be done via HTTP.

We can demonstrate this by making some simple curl requests to our new Cloudant instance. Authentication is achieved by using simple HTTP authentication, so the base URL for your instance will be something like this:

https://<username>:<password>@<hostname>.cloudant.com

Initially you’ll notice that the username and hostname are the same, but this will change as new users are added to the database.

Note: For the purposes of this article we are going to assume a username and hostname of ‘sevendays’ and a password of ‘password’

If you do a simple HTTP GET request to the above URL, you should see that your Cloudant instance is ready and waiting. You can make GET requests like this from your browser, but for most of these examples we’ll use curl so let’s start with a curl example:

Input

curl -X GET 'https://sevendays:password@sevendays.cloudant.com'

Output

{"couchdb":"Welcome","version":"1.0.2","cloudant_build":"2580"}

All actions for CouchDB are HTTP calls and there are some great API reference docs which will show off all the various options. There are also alternative interfaces available; Cloudant offers a rich dashboard based on the Fauxton project and you could run this with a local CouchDB as well. For now, we’ll stick with the curl examples so that you can see how to make the HTTP calls themselves, from your own applications you might make HTTP calls or use a wrapper library.

First, let’s access a list of databases. CouchDB has a special endpoint for this /_all_dbs so the curl call would be:

Input

curl -X GET 'https://sevendays:password@sevendays.cloudant.com/_all_dbs'

Output

["students"]

We can see that the students database is visible in the collection since we created it earlier. Now we know where our database is, we can start adding records to it.

Add Students to the Database

We want to store students, their courses and their grades. Cloudant is a NoSQL database so our database design is quite freeform. For this example, we’ll store a record per student, and then an array of modules that the student is studying, including their grade. Here’s the simplest record getting inserted:

Input

curl -X POST -H "Content-Type: application/json" 'https://sevendays:password@sevendays.cloudant.com/students' -d '{"name": "Janet Doe", "modules": [{"name": "Calculus 101", "score": 87}]}'

Output

 
{"ok":true,"id":"62d403135819d12627a75dc2c01736ae","rev":"1-79b124addc531ca0f37fc3bbb9ca126b"}

The result here gives an ok value of true and also adds an id and a rev field. The ID is what you’d expect: a unique identifier that we can use to refer to this record. The rev field is the current version of this record, we use this field when we’re updating data to let the database know which version of the record we based our changes on. This extra metadata is what enables CouchDB’s ability for syncing between databases that may have been offline from one another for some time.

Let’s see the data that’s currently stored in our database, using another magic endpoint from CouchDB called /_all_docs. To see what’s in the students database, the curl command would be:

Input

curl 'https://sevendays:password@sevendays.cloudant.com/students/_all_docs?include_docs=true'

Output

{
  "total_rows": 15,
  "offset": 0,
  "rows": [
    {
      "id": "f9503d195957585f520f0719a06ee5d0",
      "key": "f9503d195957585f520f0719a06ee5d0",
      "value": {
        "rev": "1-79b124addc531ca0f37fc3bbb9ca126b"
      },
      "doc": {
        "_id": "f9503d195957585f520f0719a06ee5d0",
        "_rev": "1-79b124addc531ca0f37fc3bbb9ca126b",
        "name": "Janet Doe",
        "modules": [
          {
            "name": "Calculus 101",
            "score": 87
          }
        ]
      }
    }
  ]
}

Note that this example included the ?include_docs=true parameter on the request; this includes the body of the document (i.e. the actual data) rather than just the id and revision information.

With some data in place (and a few more rows added to make it more interesting), let’s look at how we design and use views in Cloudant.

Modifying Student Data

In the earlier example we created a record for Janet Doe, who was only studying one module – Calculus 101. If Janet then starts to study the Data Structures and Design module, we will need to update her record in the database.

To do this we need to do almost the exact same thing, with a few small changes.

  • We need to make a PUT request, rather than a POST
  • The API endpoint must also include the document ID we are updating
  • The data that we provide must also include the current revision number of this document in the _rev field

We also need to supply the whole document again, with the addition of our changes. In this case, we have added the “Data Structures and Design” module to our modules array.

Check out the example below:

Input

curl -X PUT -H "Content-Type: application/json" 'https://sevendays:password@sevendays.cloudant.com/students/62d403135819d12627a75dc2c01736ae' -d '{"_rev":"1-79b124addc531ca0f37fc3bbb9ca126b", "name": "Janet Doe", "modules": [{"name": "Calculus 101", "score": 87},{"name":"Data Structures and Design","score":78}]}'

Output

{"ok":true,"id":"62d403135819d12627a75dc2c01736ae","rev":"2-611087a6f0d8a60b06e9705fcc7566ce"}

The result is identical to the create example we did earlier, but notice how the revision number has incremented to indicate that this document is now updated.

If the _rev field that is supplied is not the current revision number, that can lead to a document conflict. This is covered in more detail here.

Fetch and Analyze Data with Views

To select data in Cloudant, we use views. The _all_docs endpoint that we used above is a built-in view and depending on what information we want to use in our applications, we’ll build our own view accordingly. Cloudant/CouchDB requires that you design the views and they become part of the database. Then you can use them when you need them. These views use a Map/Reduce approach, which means that these views are performant even when the data sets become very large.

Using Map/Reduce in views means that performance scales up along with our data.

A view is created just like any other record, over HTTP. We give the view a name and then make a PUT request to where we want the view to reside. The view is in JSON format, and we add callable functions for both the “map” and the “reduce” sections of our view. For our first view, we’ll fetch a list of students, and show some statistics about their grades using the built in _stats reduce function. Here’s the view definition, which sets up a view called foo in the students space:

{
  "_id" : "_design/students",
  "views" : {
    "foo" : {
      "map" : "function(doc){ 
          if(doc.modules.length > 0) {
              for(var idx in doc.modules) {
                  emit(doc.name, doc.modules[idx].score)
              }
          }
      }",
      "reduce" : "_stats"
    }
  }
}

It’s easier to store the JSON data in a file and supply that to curl, so the above is stored in students_design.json. To create the view, we PUT to its location:

curl -X PUT -H "Content-Type: application/json" 'https://sevendays:password@sevendays.cloudant.com/students/_design/students' --data @students_design.json

We’ll get a success response and this will include the revision of the view. If you want to update your view at any point, you need to add a field _rev into your view definition and include the newest revision when making the same PUT request with the changed JSON data.

Let’s inspect what we created in that view. The map function here simply checks that the student does have a modules array, and then iterates over it. For each module, our map function will emit, or output, a record that has the student’s name as the key and their score in that module as the value.

We can see the output of the map step by itself by using the URL we’d usually use to access the view (i.e. /_design/students/_view/foo) but by including a parameter ?reduce=false we will only perform the map step. Doing this is recommended if you don’t need the reduce step – but in this case we’re using it so we can inspect what happens.

Making the curl request to the view:

curl 'https://sevendays:password@sevendays.cloudant.com/students/_design/students/_view/foo?reduce=false'

The output of the view looks something like this (just the first few lines):

{
  "total_rows": 26,
  "offset": 0,
  "rows": [
    {
      "id": "04d6030005e1e873a12323602f6971eb",
      "key": "Dominic Carter",
      "value": 63
    },
    {
      "id": "04d6030005e1e873a12323602f6971eb",
      "key": "Dominic Carter",
      "value": 64
    },
    {
      "id": "04d6030005e1e873a12323602f6971eb",
      "key": "Dominic Carter",
      "value": 67
    },
    {
      "id": "62d403135819d12627a75dc2c01736ae",
      "key": "Janet Doe",
      "value": 87
    },
    {
      "id": "98ed5f8fc3ffbce7365699791325756a",
      "key": "Janice Doe",
      "value": 56
    },
...

As we can see, the map step takes a user like Janet with one module and outputs one record, but for Dominic who has three modules, we get three records. This approach means we can easily run aggregate queries on these many small pieces of data. In this case, our view just uses the built-in _stats reduce function, so let’s see what happens when we enable the reduce step by removing the reduce parameter we set earlier.

Input

curl 'https://sevendays:password@sevendays.cloudant.com/students/_design/students/_view/foo'

Returns this output:

{"rows":[
{"key":null,"value":{"sum":1843,"count":26,"min":48,"max":89,"sumsqr":133509}}
]}

Hmmm … that’s statistics for the whole dataset, but the reason we used the student as the key when we emitted values from the map function was so that we could get results per student. What we need here is the equivalent of an SQL GROUP BY clause, and in CouchDB views, we apply these when making the request, not when designing the view. As a result, my view is complete, and to see the results by student I will add a ?group_level=1 parameter to the query:

 curl 'https://sevendays:password@sevendays.cloudant.com/students/_design/students/_view/foo?group_level=1'

The results look a bit more interesting now!

Output

{"rows":[
{"key":"Dominic Carter","value":{"sum":194,"count":3,"min":63,"max":67,"sumsqr":12554}},
{"key":"Janet Doe","value":{"sum":84,"count":1,"min":87,"max":87,"sumsqr":7138}},
{"key":"Janice Doe","value":{"sum":127,"count":2,"min":56,"max":71,"sumsqr":8177}},
{"key":"Janine Doe","value":{"sum":232,"count":3,"min":71,"max":89,"sumsqr":18146}},
{"key":"Kelly Carter","value":{"sum":78,"count":1,"min":78,"max":78,"sumsqr":6084}},
{"key":"Laura Carter","value":{"sum":166,"count":2,"min":79,"max":87,"sumsqr":13810}},
{"key":"Lee Johnson","value":{"sum":188,"count":3,"min":57,"max":68,"sumsqr":11842}},
{"key":"Samantha Hewitt","value":{"sum":201,"count":3,"min":56,"max":76,"sumsqr":13673}},
{"key":"Simon Jones","value":{"sum":154,"count":2,"min":73,"max":81,"sumsqr":11890}},
{"key":"Stephen McDonald","value":{"sum":112,"count":2,"min":48,"max":64,"sumsqr":6400}},
{"key":"Steven Ord","value":{"sum":217,"count":3,"min":65,"max":79,"sumsqr":15795}}
]}

The group_level parameter indicates how many levels of key should be observed in grouping – we used a single key but it’s also possible to use arrays here and group by varying levels in different situations.

Another view example would be to use the modules themselves as the basis of the data we want to view – for example how many students are enrolled for each module. Exactly as before, we prepare a JSON definition of the view in a file (courses_design.json in this case) and then make a PUT request to create it.

Here’s the view definition:

{
  "_id" : "_design/courses",
  "views" : {
    "foo" : {
      "map" : "function(doc){ 
          if(doc.modules.length > 0) {
              for(var idx in doc.modules) {
                  emit(doc.modules[idx].name, 1)

              }
          }
      }",
      "reduce": "function (keys, values, rereduce) {
          return sum(values)
      }"
    }
  }
}

And the request to create it:

curl -X PUT -H "Content-Type: application/json" 'https://sevendays:password@sevendays.cloudant.com/students/_design/courses' --data @courses_design.json

The view has a map function that emits the module name as the key, and a value of one. This view also has a reduce function, which sums the values. Again, we can adjust how many levels of information we want to group by in our query, which we’ll do in our example by setting the ?group_level to 1:

curl 'https://sevendays:password@sevendays.cloudant.com/students/_design/courses/_view/foo?group_level=1'

Remember that you can also try the query with ?reduce=false to see the output of the intermediate map step, you don’t need the group_level parameter though in that situation. With the query above, I get output like this:

{"rows":[
{"key":"Calculus 101","value":11},
{"key":"Data Structures and Design","value":6},
{"key":"Programming Principles","value":9}
]}

Working with these views is very flexible and performant, but it can seem like a bit of a learning curve if you are new to Map/Reduce. Hopefully having some examples gives you an idea of what’s possible and how to begin.

Find Students by Name

Views are a great way to find data that corresponds to a known key, but it has limitations. You must know the exact key. What if we wanted to find a student by their name but are unsure on what their full name is, for example? We couldn’t do this with a view.

This is where Search Indexes come in. This is a little something extra that Cloudant have bolted onto the side of regular CouchDB and it is very powerful, allowing full-text searching using the Lucene Query Parser Syntax.

We start by defining our index, in very much the same way as we defined our view, with a few key differences:

{
  "_id": "_design/search",
  "indexes": {
    "by_name": {
      "index": "function(doc){
        index(\"name\", doc.name);
      }"
    }
  }
}

Let’s run through this step-by-step.

We are creating a new _design document, but instead of defining views, we are going to define indexes.

Each index has a name (by_name, in this example), and each index has a callable function (index). This function is used to define a query-able field (in this example, it is name), and the value that is associated with it (in this example we will use the value of doc.name).

We can then query our database via the API:

curl 'https://sevendays:password@sevendays.cloudant.com/students/_design/search/_search/by_name?q=name%3A"carter"&include_docs=true'

The important bit is here:

# removed URL encoding for easy reading
?q=name:"carter"

This is simply querying our new search index, looking for all students where the value “carter” is somewhere in the name field we defined.

So, what does this get us?

{
  "total_rows": 3,
  "bookmark": "g1AAAACjeJzLYWBgYMpgTmHQSUlKzi9KdUhJstBLyilNzc2s0C3N1i0uScxLSSxKMdRLzskvTUnMK9HLSy3JAelKZEiS____f1YGk5v9h3elQCGGREZUo8yINCqPBaR7AZACGrgfbOITBgaIiVkACDs0iQ",
  "rows": [{
    "id": "229881fa954f75370c3f47b898a3927d",
    "order": [1.0582170486450195, 1],
    "fields": {},
    "doc": {
      "_id": "229881fa954f75370c3f47b898a3927d",
      "_rev": "1-a2ca609c02c1e4f2a625c7da7e86661f",
      "name": "Laura Carter",
      "modules": [ ... ]
    }
  },
  {
    "id": "ac161022ce339ec768e6f9503a805bd6",
    "order": [0.625, 0],
    "fields": {},
    "doc": {
      "_id": "ac161022ce339ec768e6f9503a805bd6",
      "_rev": "1-78cfbdeae85a78dc8cd9864e952a54cd",
      "name": "Kelly Carter",
      "modules": [ ... ]
    }
  },
  {
    "id": "04d6030005e1e873a12323602f6971eb",
    "order": [0.625, 1],
    "fields": {},
    "doc": {
      "_id": "04d6030005e1e873a12323602f6971eb",
      "_rev": "1-0a135a3fbf3f2e4bcd59b195ca437f4f",
      "name": "Dominic Carter",
      "modules": [ ... ]
    }
  }]
}

Three students are returned, all with the surname “Carter”. Perfect!

Check out the extensive Cloudant Search documentation for more information on how to build complex search queries using Cloudant.

Using the Cloudant Web Interface

The web interface allows creation, editing and usage of views, and this might be easier to work with when creating views that will later be accessed programmatically.

Cloudant Web Interface
Cloudant Web Interface

There are a few things to look at:

  • Access views via the white sidebar. In this case I’ve drilled down to the courses one that we just created.
  • Set query options. On the upper right, click Options to see query settings. By default, Reduce is disabled, but you can tick the box to enable it.
  • Set grouping and other parameters here in the Query Options window too.
  • To include each document record and its metadata, go to the top toolbar and tick the Include Docs checkbox, which you can use with a resultset or map output. It’s off in this screenshot, because it doesn’t make sense to use it with a view that has the reduce step enabled.

Conclusion

Cloudant is a service for those developers who want to get on and build their app, rather than worry about the trials and tribulations of managing and scaling infrastructure. It has the featureset you would hope for from any database service, and its Cloudant Query feature is useful to help smooth the transition if you are coming from an SQL background. Being able to access all of these features via a very simple HTTP API is also a big positive when considering ease of use.

On the flip side, NoSQL document stores like this can be a bit of a head scratcher if you are unfamiliar with them, and using MapReduce to create views is something of a barrier at first – although being able to do this in Javascript does make this somewhat more accessible.

Join The Discussion

Your email address will not be published. Required fields are marked *