Cloudant allows custom Javascript to be run server-side to generate indexes for MapReduce and Lucene search indexes. This article outlines common pitfalls, their solutions and advocates for automated testing of such code.

Writing data into Cloudant is easy. Simply post data to your database using our RESTful API. When it comes to querying your data, most folks choose Cloudant Query as it is easy to setup and has a simple declarative language that allows queries to be expressed as JSON.

Many customers, however, need to use the raw power of Cloudant MapReduce or Cloudant Search, both of which require JavaScript functions to be uploaded in design documents. Because indexes are built asynchronously in Cloudant, it may be some time before a mistake in a JavaScript function is spotted, especially with large databases.

This article describes the most common pitfalls, how to avoid them, and how to code defensively in JavaScript functions.

Cloudant MapReduce functions issue “emit” function calls for each item to be indexed. Cloudant Search functions issue “index” function calls for each item to be indexed. The code below refers mainly to MapReduce functions, but the same guidance can be followed for Search functions too.

Syntax Error

It is important to watch for, and avoid, syntax errors in your JavaScript code.

The Cloudant Dashboard helps you check the syntax of any JavaScript code you enter:

Map Function
This code has an extra ‘open’ bracket in the call to ‘emit’. A red cross on the offending line shows where the error is. This makes it impossible to submit code that is syntactically incorrect.

By contrast, the Cloudant API cannot perform these kinds of syntax checks in advance. If we submit the same code through the API, it is accepted, because the API is only responsible for checking that the document is valid JSON:

curl -X POST "https://myaccount.cloudant.com/js" \
 -H 'Content-type: application/json' \
 -d'{"id":"_design/fetch","index":{ "bya":{ "map": "function(doc){ emit((doc.a,null);}" }}}'

Missing Doc

A common error is to miss the “doc” part, when accessing part of the document. For example:

Exceptions Lead To Documents Not Being Indexed

Cloudant is a schema-less database. Each document in a database can have a different schema to the next one. Unless a map function behaves defensively, it may attempt to access a non-existent property of the doc object, resulting in an exception. A thrown exception results in no indexing activity for the document being processed.

Let’s say our documents have this form, with an additional property “c” being optional:

{
  "_id": "1",
  "a": 1,
  "b": 2
}

And our map function looks like this:

function(doc) {
  emit(doc.a, null);
  emit(x, null);
}

As the attempt to emit “x” throws an exception, the perfectly valid call to emit “doc.a” will not reach the index. The above example is poor code, because ‘x’ can never exist, but if we have some documents with a ‘c’ property and some without, what happens if we try and index a non-existent property of doc?

function(doc) {
  emit(doc.a, null);
  emit(doc.c, null);
}

The answer is that the second emit causes a key of “null” to be stored in the index. This can be prevented with some defensive coding:

Defensive Coding Map Function

The Truth Is Out There

Take what seems to be a reasonable map function:

function(doc) {
  if (doc.a) {
    emit(doc.a, null);
  }
}

The problem with this function arises from JavaScript’s definition of truth. We can see what this means by thinking about the different values of ‘a’ that might be seen, and the effect they would have in the ‘if’ evaluation.

Map Function Evalution

So our map function has to be careful to:

  • Avoid indexing null keys, by accessing doc.a (if doc.a was null or undefined)
  • Ensure that the only the correct data types are emitted into an index

Our original map function does not have the first problem, but unfortunately it fails the second requirement. This shortcoming might cause our code to discard documents that should be indexed (documents where ‘a’ is zero, for instance).

The best practice here is to check the typeof an object’s property before emitting it:

function(doc) {
  if (typeof(doc.a) === "string" && typeof(doc.b) === "number") {
    emit(doc.a, doc.b);
  }
}

The typeof operator does not throw an exception, even when checking for properties that don’t exist. ‘typeof’ lets you check the type of each variable before anything is inserted into the index. One thing to avoid is the null property:

console.log( typeof null) 
// object !!

See http://james.padolsey.com/javascript/truthy-falsey/ for more JavaScript logical anomalies.

Dates – Not All JavaScript Engines Are The Same

Let’s say we have documents like this:

{
  "_id": "1",
  "date": "2014-08-15",
  "temperature": 17.8
}

Because JavaScript has no built-in Date type, strings are often used to represent the date in YYYY-MM-DD format. One technique for indexing this data is to break the date into its constituent parts, and index them in a compound key constructed out of Numbers:

function(doc) {
  if (typeof(doc.date) === "string" && typeof(doc.temperature) === "number") {
    var bits = doc.date.split("-");
    var year = parseInt(bits[0]);
    var month = parseInt(bits[1]);
    var day = parseInt(bits[2]);
    emit( [year, month, day], doc.temperature);
  }
}

Given our document, we would expect keys to be emitted of the following form:

[ 2014, 8, 15] → 17.8

but we don’t. Instead we get:

[ 2014, 0, 15] → 17.8

What’s going on? The parseInt function assumes that a string with a leading ‘0’ represents an Octal number. “08” is nonsensical in Octal, so zero is returned. Fortunately, parseInt allows us to force the base using the second parameter:

function(doc) {
  if (typeof(doc.date) === "string" && typeof(doc.temperature) === "number") {
    var bits = doc.date.split("-");
    var year = parseInt(bits[0], 10);
    var month = parseInt(bits[1], 10);
    var day = parseInt(bits[2], 10);
    emit( [year, month, day], doc.temperature);
  }
}

A better solution is to use JavaScript’s built-in Date object

function(doc) {
  if (typeof(doc.date) === "string" && typeof(doc.temperature) === "number") {
    var d = new Date(doc.date);
    emit([ d.getFullYear(), d.getMonth()+1, d.getDay()], doc.temperature);
  }
}

However, if the string cannot be converted to a Date object, calls to functions such as “getFullYear” will return “NaN”, So a an even better solution is to check the Date object before using it:

function(doc) {
  if (typeof(doc.date) === "string" && typeof(doc.temperature) === "number") {
    var d = new Date(doc.date);
    if (!isNaN(d.getTime()) {
      emit([ d.getFullYear(), d.getMonth()+1, d.getDay()], doc.temperature);
    }
  }
}

Manually Testing Map/Index Functions

Code that is uploaded to Cloudant to run over each document in a database is just code. So it can be put through the rigours of testing and review that all production code should go through. A simple way to test a map function is to create a test harness, call it with some data and see what values it emits:

var emit = function(key, value) {
  console.log(key,"--->", value);
}
var doc = { a: 1, b:2};
var map = function(doc) { /* your code goes here */ };
map(doc);

Automatically Testing Map Functions

The automated test framework mocha paired with the assertion tool should allow automated tests to be written in JavaScript. As map functions are pure JavaScript, they too can be automatically tested.

var should = require('should');
var validdoc1 = { _id:"1", year:2014, month:8, temperature: 12};
var validdoc2 =  { _id:"6", year:2014, month:8, temperature: 13.2};
var docs = [
  validdoc1,
  { _id:"2", month:8, temperature: 12},
  { _id:"3", year:2014, temperature: 12},
  { _id:"4", year:2014, month:8},
  { _id:"5", year:null, month:null, temperature: null},
  validdoc2
];
var theIndex = []

describe('Array', function(){
  before(function() {
    var emit = function(key, value) {
      theIndex.push( {key: key, value: value});
    }
    var map = function(doc) {
      if (typeof doc.year == 'number' && typeof doc.month == 'number' 
           && typeof doc.temperature == 'number') {
        emit([doc.year,doc.month], doc.temperature);
      }
    };
    for(var i in docs) {
      map(docs[i]);
    }
  });
  
  describe('#map functoin', function(){
    
    it('should reject invalid documents', function() {
      theIndex.should.be.an.Array;
      theIndex.length.should.be.a.Number;
      theIndex.length.should.be.equal(2);
    });
    
    it('should index year, month and temperature correctly', function() {
      var first = theIndex[0];
      var second = theIndex[1];

      first.should.have.property('key');
      first.should.have.property('value');
      first['key'].should.be.an.Array;
      first['key'].length.should.be.equal(2);
      first['key'][0].should.be.a.Number;
      first['key'][0].should.be.equal(validdoc1.year);
      first['key'][1].should.be.a.Number;
      first['key'][1].should.be.equal(validdoc1.month);
      first['value'].should.be.a.Number;
      first['value'].should.be.equal(validdoc1.temperature);
      
      second.should.have.property('key');
      second.should.have.property('value');
      second['key'].should.be.an.Array;
      second['key'].length.should.be.equal(2);
      second['key'][0].should.be.a.Number;
      second['key'][0].should.be.equal(validdoc2.year);
      second['key'][1].should.be.a.Number;
      second['key'][1].should.be.equal(validdoc2.month);
      second['value'].should.be.a.Number;
      second['value'].should.be.equal(validdoc2.temperature);
    });
  });
})

Trying a map function in a test harness before running it over millions of documents can save many development hours!

Another approach would be to use PouchDB Server as a test harness for MapReduce operations. As PouchDB is CouchDB compatible, it can be used to automatically test map functions prior to deployment in production.

Join The Discussion

Your email address will not be published. Required fields are marked *