IBM Developer Advocacy

SELECT * FROM Cloudant



Mike Elsmore
7/29/15

Anyone working in software development over the last couple of decades has probably used Structured Query Language (SQL). Applications work with data, so a developer would learn a basic amount of MySQL, MSSQL, DB2, or even Microsoft Access—enough to persist their data and write queries to retreive and analyze it.

This means most developers are used to creating expressions like this:

SELECT u.name, p.title
FROM user u JOIN post p ON u.user_id = p.user_id
WHERE p.created > <timestamp>

For new users, it often helps to draw some rough comparisons between the SQL they already know and the querying and indexing systems present in Cloudant.

Drop the JOINs

With Cloudant, records are organized into JSON documents, not broken into tables made of rows and columns. This approach eliminates the concept of a FROM clause altogether. And because you don’t have tables, you can’t JOIN them together, either. Instead, you denormalize relational data when moving it into Cloudant, so more of it is stored together in the first place.

The whole idea is to structure your Cloudant JSON to avoid uncessary JOIN operations. But Cloudant has got a means of doing Ad-Hoc queries, corresponding to the SELECT and WHERE portions of a query. This is possible using something called Cloudant Query. It is a declarative query system, available at the _find API endpoint of a Cloudant database.

This means you can send JSON requests to /db/_find with queries similar to the following:

{
  "selector": {
    "year": {
      "$gt": 2010
    }
  },
  "fields": ["_id", "_rev", "year", "title"],
  "sort": [{"year": "asc"}],
  "limit": 10,
  "skip": 0
}

In SQL terms, you can treat the selector JSON as being like the SQL WHERE clause. Similarly, the fields is like SELECT, sort is like ORDER BY and the limit and skip fields are roughly equivalent to LIMIT and OFFSET. All that said, selector is the only required portion. You can get the specifics of the selector syntax and available operators in the API Documentation and in the Learning Center.

However, unlike in SQL where you can simply write and execute a valid query and it will work, even if it isn’t performant, you can’t do this with Cloudant Query.

Index first, then query

We need to clarify some terminology. In relational databases, an “index” is generally thought of as a performance optimization that speeds data access. In Cloudant, an “index” is actually a prerequisite for data access and is more akin to an RDBMS “materialized view.” It’s the tool for organizing variably structured JSON data. First you build an index, then you issue queries against it.

The conflation of terms is unfortunate. In Cloudant, we refer to the concept of an “index” in multiple different ways: “database view” and “secondary index” both mean the same thing. But now you know. ;-)

Cloudant Query relies on the underlying MapReduce View and Lucene Search indexing systems to find the relevant documents. When used directly, they require a little more setup before you can start asking questions of the data. Cloudant Query aims to streamline that process, and we’ve designed it to serve as the entrypoint for developers who are new to Cloudant.

Intro to Cloudant Query

CQ indexes currently have two types: json, which is the default, and text. Each type has advantages and disadvantages. If you know exactly what query you’re going to make, and want the fastest performance, you should use the json type. The json type is based on MapReduce views, which are smaller and much faster at processing. However, if you require more flexible queries with lots of conditions and operators, and can trade off the performance, then use the text type, which is based on Lucene.

To create an index in CQ, all you need to do is make a POST request to the _index API endpoint of the database in question. The most basic index you could create is the following:

{
  "index": {},
  "type": "text"
}

This will create a text index for full-text searches across all the fields in the available documents. It’s helpful to build indexes for only the fields you need, as they help with the overall performance of queries. As an example, the following is an index for the https://examples.cloudant.com/movies-demo/ database:

{
  "index": {
    "fields": ["Person_name","Person_pob","Person_dob"]
  },
  "name": "actors",
  "type": "text"
}

This index allows you to get faster queries when looking up information based against the actor’s name, date and place of birth, while still allowing for more complex query logic. As you refine your data access patterns, you can further optimize your indexing and querying by moving to the json type index. You’ll see the best performance, but with less flexible queries.

See you next time

This is the first in a three-part series of posts where I’ll introduce you to indexing and querying in Cloudant. In part two, we’ll look at advanced queries in the MapReduce View indexing system. In part three, we’ll explore full-text indexes built using Cloudant’s Lucene Search system. See you then!

blog comments powered by Disqus