The IBM Cloudant NoSQL Database-as-a-Service allows you to store JSON documents in the cloud using a simple HTTP API. Cloudant comes equipped with a number of indexes that allow you to query your data in several powerful ways:

  • Primary Index to retrieve documents by their id, which is the primary key
  • MapReduce to do secondary key lookups and online analytics
  • Cloudant Search for full-text, wildcard and faceted search
  • Cloudant Geospatial for complex polygon and 4D spatial queries
  • Cloudant Query, a declarative query language that incorporates a number of indexing capabilities

Cloudant Query is the best way to get started with querying Cloudant databases; a simple API call is used to define the list of fields to be indexed. Under the hood, Cloudant Query can leverage various indexes to provide a full breadth of querying capabilities.

Sample Data

In order to demonstrate the new features we need some sample data. The following database contains 9,000 movie documents in the following format:

{
    "_id": "71562",
    "_rev": "1-72726eda3b8b2973ef259dd0c7410a83",
    "title": "The Godfather: Part II",
    "year": 1974,
    "rating": "R",
    "runtime": "200 min",
    "genre": [
        "Crime",
        "Drama"
    ],
    "director": "Francis Ford Coppola",
    "writer": [
        "Francis Ford Coppola (screenplay)",
        "Mario Puzo (screenplay)",
        "Mario Puzo (based on the novel \"The Godfather\")"
    ],
    "cast": [
        "Al Pacino",
        "Robert Duvall",
        "Diane Keaton",
        "Robert De Niro"
    ],
    "poster": "http://ia.media-imdb.com/images/M/..._V1_SX300.jpg",
    "imdb": {
        "rating": 9.1,
        "votes": 656,
        "id": "tt0071562"
    }
}

To use this data set:

  • Sign up for a Cloudant account
  • Replicate the database into your account. Choose Replication → New Replication and complete the form
    • Source Database: Remote database – https://examples.cloudant.com/query-movies
    • Target Database: New local database – “movies”

Creating A Cloudant Query Index

Once the data has replicated to your Cloudant account, we can instruct Cloudant to create an index from the Cloudant Dashboard by selecting the database and choosing Query → + → New Query Index:

The form will be pre-filled with an index definition of:

{
  "index": {
    "fields": [
      "foo"
    ]
  },
  "type": "json"
}

In our case, we are going to overwrite the sample with a text type index that automatically indexes all fields in all documents in the database. Replace the JSON text to have { "index": {}, "type": "text"} as shown in screenshot below:

Simply click “Create Index” to instruct Cloudant to index the movie data.

The “text” index type is new in this iteration of Cloudant Query and by default indexes all the fields in your document. We can supply the individual fields to be indexed (in the index object), but by supplying an empty object we are asking for everything to be indexed.

The same instruction can be issued using the Cloudant API —

curl -X POST https://user:pass@account.cloudant.com/movies/_index 
     -d '{ "index": {}, "type": "text"}'

— substituting user, pass and account for your own personal Cloudant credentials.

Querying A Cloudant Query Index

Cloudant Query queries are JSON documents with the following top-level items:

  • selector – which subset of the data to return; the equivalent of the WHERE part of an SQL statement
  • fields – the fields to be returned; the equivalent of the SELECT part of an SQL statement
  • sort – how the result set is to be ordered; the equivalent of the ORDER BY part of an SQL statement
  • limit – how many results to return
SQL Cloudant Query
SELECT 
   title, year
FROM movies
WHERE
  imdb.rating > 9.0
SORT year ASC
LIMIT 10
{
 "fields": ["title", "year"],
 "selector": {
   "imdb.rating": { "$gt": 9.0 }
 },
 "sort": [ { "year:number": "asc" } ],
 "limit": 10 
}

At its simplest, a query looks like this:

{
 "selector": {
  "year": 2012
 }
}

The above query is looking for films where the year field is equal to 2012.

Queries can be cut-and-pasted into the Cloudant Dashboard. Clicking “Run Query” posts the results in the right-hand panel:

The Cloudant Query API can also be used to perform queries by POSTing to a database’s _find endpoint:

curl -X POST https://user:pass@account.cloudant.com/movies/_find 
     -d '{ "selector": { "year": 2012 }, "limit": 10}'

Cloudant Query Selector

The selector part of the JSON query allows you to specify which subset of the database to return. Selectors can be dealt with several ways:

one field:

"selector": { "year": 2012 }

multiple fields:

"selector": { "year": 2012, "rating": "R" }

condition operators ($gt, $lt, $eq, $ne … see our docs for full list ):

"selector": { "imdb.rating": { "$gt": 9.0 } }

free-text match (the $text operator matches any field in your document):

"selector": { "$text": "Al Pacino" }

match arrays (exactly):

"selector": { "genre": [ "Animation", "Comedy" ] }

match value is in array:

"selector": { "genre": { "$in": ["Horror"] } }

match any values are in array:

"selector": { "year": { "$in": [2013,2015] } }

match values are not in array

"selector": { "year": { "$nin": [2013,2015] } }

the existence of fields:

"selector": { "rating": { "$exists": true } }

We can combine the $and, $or and $not operators to produce complex queries:

"selector": { 
  "$and" : [
    { "year": { "$lt": 1990 } },
    { "imdb.rating": { "$gt": 7.0 } }, 
    { "$text": "Marlon Brando" }
  ]
}
"selector": { 
  "$and" : [
    { "year": { "$gt": 1980 } },
    { "year": { "$lt": 1990 } },
    { "$not": { "title": "Aliens" } },
    { "$text": "Sigourney Weaver" }
  ]
}
"selector": { 
  "$or" : [
    { "director": "George Lucas"  },
    { "director": "Steven Spielberg"  }
  ]
}

Cloudant Query Fields

The fields element can be used to instruct the Cloudant Query engine to only return a subset of the underlying documents e.g.

{
 "selector": {
  "cast": {
   "$in": [
    "Julia Roberts"
   ]
  }
 },
 "fields": [
  "title",
  "year",
  "imdb.rating"
 ],
 "limit": 10
}

returns only partial documents e.g.

{
 "title": "Flatliners",
 "year": 1990,
 "imdb": {
  "rating": 6.5
 }
}

Cloudant Query Sort

If a sort element is supplied, then the results set is sorted according to the supplied array e.g.

{
   "selector": { 
     "cast" :  { "$in" : ["Tom Hanks"] }
   },
   "sort": [ { "year:number": "desc" } ] 
 }

With indexes where type=”text”, each field must be paired with the type of that field (number or string) to instruct Cloudant Query to treat it as a numerical or alphabetic sorting algorithm. Sort orders can be either ascending (asc) or descending (desc).

Multi-dimensional sorts can be achieved by adding to the sort array:

{
  "selector": { 
    "cast" :  { "$in" : ["Tom Hanks"] }
  },
  "sort": [ 
    { "year:number": "asc" },
    { "title:string": "asc" }
  ] 
}

Cloudant Query Pagination

When using Cloudant Query’s type=”text” indexes, pagination is performed by:

  • page 1 – performing a query to get first page of search results
  • page 2 – repeating the query but adding the bookmark parameter received in the reply to the first request

e.g.

we perform our first query:

curl -X POST https://user:pass@account.cloudant.com/movies/_find 
     -d '{ "selector": { "year": 2012 }, "limit": 10}'

which gives a reply of:

{
  "docs":[ ... ],
  "bookmark": "g2wAAAABaANkABxkYmNvcmVAZGIxLm"
}

To get the second page of results, we repeat the query and add the first request’s bookmark into our object:

curl -X POST https://user:pass@account.cloudant.com/movies/_find 
     -d '{ "selector": { "year": 2012 }, 
                         "limit": 10, 
                         "bookmark": "g2wAAAABaANkABxkYmNvcmVAZGIxLm"}'

The bookmark concept is the same mechanism used by Cloudant Search and provides a scalable way to paginate through large result sets.

What’s The Difference Between “JSON” And “Text” Indexes?

Indexes based on type=”json” become MapReduce-based materialized views under the hood. Their fixed key structure will only allow queries that match the key structure. i.e., if we create a “json” index based on title, firstname and lastname, we can perform queries based on those three fields but not just lastname, for instance. Type=”json” indexes are quicker to build and may be quicker for single-field lookups.

Indexes based on type=”text” become Lucene-based indexes under the hood and can answer arbitrary queries based on any of the indexed fields in any order. Type=”text” indexes are the easiest way to start with Cloudant Query as they index all fields by default allowing ad-hoc querying of a data set.

Watch Cloudant Query in Action

Join The Discussion

Your email address will not be published. Required fields are marked *