Before we get into how to use/make views, let’s start with how (and when) to NOT use views. Views are a great way to get things done in Cloudant, but there are a couple drawbacks to keep in mind: mainly, they must be built by an indexing process whenever created or changed, they take up space, and they are not always the most efficient way to retrieve data.

When a design document is created or updated, all views and indexes defined in that document must be rebuilt. When this happens, Cloudant iterates over every document in the database, and on each document, it runs the map functions for each view in the design doc and then saves the resulting keys and values for each view. Whenever a document is created or updated, all the database’s views are run and updated for just that document. Those saved results are what you query against when you call a view, and that’s why calling views is so fast. This post gives more information about index-building, but the main thing to keep in mind is that if the database has a large number of documents or the views have complex map functions, this process can be time consuming and operationally expensive. You can also check out this guide to Design Document Management for a deeper dive into optimizing your design docs.

How much space a view takes up likely won’t be a significant consideration unless you’re dealing with pretty huge databases or emitting a lot of data into the view, but if it’s possible to accomplish the same goal without taking up additional storage space, that’s usually preferable. You can use a design doc’s _info endpoint to see how much storage your views are taking up.

Views are a type of secondary index, so in almost every situation using the primary index will be more efficient. Cloudant’s primary index is keyed by the documents’ “_id” fields. Views give you a lot more freedom and capability than the primary index, but clever use of document ids can often allow you to use the primary index in certain places where you might initially expect to use a secondary index. A couple things to know if you’re thinking about how you can implement this:

  • Lookups (getting a document by its “_id”) will always be faster and more efficient than calls to views and searches. For multi-tenant Cloudant on IBM Cloud, they’re also cheaper.
  • The _all_docs endpoint accesses the primary index and can be called with startkey and endkey to retrieve a subset of all the docs. Calling the _all_docs endpoint counts as a query for IBM Cloud pricing (as do views) but since it’s built-in it will be more optimized than a custom view. Also, since it’s based on the primary index, the necessary indexing that Cloudant has to do will be done anyways, so you’re not creating additional indexing processes or additional storage space.

Keeping all this in mind, let’s look at two example cases that demonstrate how cleverly constructed “_id”s can be used to replace views. The first case shows example documents for a database with randomly generated ids and three views. The second case shows example documents for the same database, but with ids constructed such that the views can be replaced with calls to _all_docs.

For both cases, we’ll look at a “Pets” database that contains three types of documents: category, breed, and food. Category docs contain info about broad pet categories, such as dog, cat, fish, etc. Breed docs contain info about specific breeds in each category. Food docs contain info about pet foods eaten by different breeds. The user/application needs to be able to access a list of all categories, the doc for a specific category, a list of all breeds in a certain category, and a list of all foods eaten by a certain breed. The example docs below demonstrate the schema of the different doc types and how the user/application could get the data it needs from those docs.

First Case: Without using “_id”s

For the first case, here’s what the database might look like using views and NOT utilizing the “_id” field:

Example docs:

Category:

Breed:

Food:

Design doc:

Use case for each view:

  • “categories”: get all categories; pull doc for specific category (call with key=<category>&include_docs=true)
  • “breed-by-category”: get all breeds for a category (call with key=<category>)
  • “food-by-breed”: get all foods for a breed (call with key=<breed>)

Second Case: Using “_id”s to Replace Views

For the second case, here’s what the same database would look like utilizing the “_id” field to replace the views:

The following templates would be used to construct doc ids for each document type:

  • category – “category:<category name>”
  • breed – “breed:<category name>:<breed name>”
  • food – “food:<category name>:<breed name>:<food name>”
Example docs:

Category:

Breed:

Food:

Replace view for each use case:

  • get all categories: /_all_docs?startkey="category:"&endkey="category;"
  • pull doc for specific category (using a lookup instead of a view, which will decrease cost as well as improve efficiency): /category:<category>
  • get all breeds for a category: /_all_docs?startkey="breed:<category>:"&endkey="breed:<category>;"
  • get all foods for a breed: /_all_docs?startkey="food:<category>:<breed>:"&endkey="food:<category>:<breed>;"

(Note that the endkey in each case ends with a semicolon instead of a colon. This is because _all_docs uses ASCII collation to sort doc ids, and “;” comes after “:” in ASCII.)

The above setup assumes that each pet food is only eaten by one breed. If that’s not the case, then a view would probably be necessary, and the docs might look like this:

Food:

Design doc:

The view in the above design doc loops through the array of “breeds” in the food doc and emits each breed as a key, with the name of the food as the value. To get all the foods for a breed, you would query the “food-by-breed” view with key=<breed>.

Hopefully these examples help you think through ways to best utilize Cloudant’s primary index for accessing your data. Now we can move on to what to consider when you decide you do need a view.

What should be “emitted” into the view?

When you write the javascript map function for your view, it will “emit” a key and value to the view. The key and value can be any data type, including null. When you query the view, you will receive an array of objects which each have a “key” field, a “value” field, and and “id” field containing the “_id” for the doc that the key and value came from. The keys and values that you choose to emit in the view will be saved with that view and will contribute to the amount of space that the view takes up; therefore, it’s usually preferable to only emit the data you need from the view. With that in mind, here are a few rules of thumb when deciding what to emit to your view:

You don’t need to emit the “_id”

The doc’s id will already be saved as part of the view, so there’s usually no reason to also emit it. If there’s a situation where the “_id” field is the only thing you need from the view, you can emit(null, null). There could be a situation if you’re interested in using “startkey” and “endkey” on the id or trying to do a reduce for a certain id where you would want to use the id as the key in the view. Besides that, it’s generally unnecessary.

It is, however, sometimes useful to emit(some_key, {"_id": "<id of a DIFFERENT doc>"), because if you do that and use include_docs=true, Cloudant will return the full doc with the “_id” in the value rather than the doc that row came from.

You (probably) shouldn’t emit the whole doc

If you will usually only need some subset of the data from the doc in the view, then just emit the data you’re interested in. When calling a view, you can set the include_docs query argument to true to retrieve the full document along with the id, key, and value, so it’s usually unnecessary to emit the whole document into the view. Having the whole doc in the view takes up more space and can slow down view queries because it’s sending more data. That being said, using include_docs=true is slightly slower than emitting the doc, because it has to do a lookup for each doc instead of having it on hand in the view.

If you will need to access the whole document for every row in your view every time you call that view, and the amount of storage space taken up by the view is not a concern (i.e. with a relatively small db or small number of rows in view), then you can go ahead and emit the whole doc. This will essentially save a duplicate of the doc in the view, which will take up additional space, but it will also be faster to retrieve.

Understand view collation to make useful keys

Querying views by key and/or with startkey and endkey can be a convenient way to access data, so choosing useful keys is important for many view use cases. For example, you can emit date/time values as the key and use startkey/endkey to retrieve data for a certain date/time range. You can also build complex keys, such as an array containing time and data type, so you can retrieve a certain data type over a certain time period. If you plan on using startkey/endkey with your view, be sure to check out this doc on view collation so you understand how CouchDB (which Cloudant is based on) orders the rows in a view.

Additional Resources for Cloudant Views

 

2 comments on"A guide to writing (and avoiding) views in Cloudant"

  1. Brijesh verma September 05, 2018

    Hi,

    May this question is not relevant but i need some input on this.
    Is there any operator supported by cloudant to compare String in case sensitive way.
    Ex: {
    “String1″:”Text”
    }

    input : text

    I want to compare this two which shoul give o/p as true/yes

  2. Category

    {
    “_id”:”loqueseadadasdasd”,
    “type”:”category”,
    “name”:”Dog”
    }

    Breed:

    {
    “_id”:”iotiueiwoiqri3232″,
    “type”:”breed”,
    “name”:”Golden Retriever”,
    “category”:”Dog”
    }

    Food:

    {
    “id”:”turutureewe234″,
    “type”:”food”,
    “name”:”Dog Food (Golden Retriever)”,
    “breed”:”Golden Retriever”
    }

Join The Discussion

Your email address will not be published. Required fields are marked *