Geospatial querying is such a basic requirement for modern applications. Many apps are map-centric, like Yelp! or Hotels.com or retail store finders, which help users find places nearby. But other geospatial query use cases live deep under the covers of an app, like a ToDo list app that notifies you when you’re near the place you can accomplish a task.

This is a quick tutorial on how to use Cloudant Search to add geospatial query to your apps.

Geospatial query options in Cloudant

First off, as a developer, you need to know that there are 2 different options for performing geospatial queries in Cloudant:

  • Cloudant Geo offers the most flexible geospatial query options. You can query by radius, rectangle, and polygon, but you can’t query by any other attributes of the database at the same time. (At least not today, but engineering elves are hard at work building this feature!)

  • Cloudant Search only supports rectangle bounding box queries, but unlike Cloudant Geo, you can combine it with attribute and free text search. If you’re searching for a doctor, seeing mechanics in search results gets in the way, so refining your geospatial search with additional attributes is a must in many cases. If your result set is small, it’s easy to do that client-side, but if it gets big (for instance, if you’re in a densely populated city) a simple geo index won’t cut it, as you really want to include additional search requirements with your location data.

    Cloudant Search is powered by Apache Lucene, the most popular open-source search library. By drawing on the speed and simplicity of Lucene, the Cloudant service provides a familiar way to add search to apps.

    Cloudant Search lets you further enhance indexing and querying with:

    • Ranked searching. Search results can be ordered by relevance or by custom sort fields
    • Powerful query types, including phrase queries, wildcard queries, proximity queries, fuzzy searches, range queries and more
    • Language-specific analyzers
    • Faceted search and filtering
    • Bookmarking. Paginate results in the style of popular Web search engines

Indexing Boston crime data for Search

There’s already a host of excellent resources on indexing and querying with Cloudant Search, so if you’re not familiar with the basics, start here:

Once you’re up-to-speed, we can have some fun with crime data! We’ll use a sample of crimes in Boston, MA provided by the city government as open data here. We already have this data in Cloudant, and you can view a sample here, or replicate the database to your own Cloudant account. If you want to follow along while coding and don’t already have a Cloudant account, sign up for a free trial here.

The first thing we need to do to the database is define our Search index. Here is the Javascript function for that:

function (doc) {
  if ( doc.properties.main_crimecode && doc.geometry.coordinates[0] && doc.geometry.coordinates[1]) {
      index("type", doc.properties.main_crimecode, {"store": true, "facet": true});
      index("long", doc.geometry.coordinates[0]);
      index("lat", doc.geometry.coordinates[1]);
  }
}

I save this to the crimes database in a design document called lucenegeoblog and name the index findcrimes (those 2 facts will be important next, when we write our queries).

Note that I’m indexing 3 properties of the database, and indexing a document only if those properties exist.

  • doc.properties.main_crimecode tells us what the crime was (or at least the main crime, since people could be doing more than one bad thing at the same time)
  • doc.geometry.coordinates[0] is where the longitude value for the crime’s location lives
  • doc.geometry.coordinates[1] is where the latitude value for the crime’s location lives

Now we’re ready to play with the data…

Querying crimes

Term search

Lucene offers a whole range of interesting ways to query text, including fuzzy matching, proximity search, numerical ranges, and more. Here, since the focus is on the geospatial aspects, we’ll just do the most basic of text searches, barely flexing Lucene’s muscles, but it’s enough to illustrate the point. Let’s just ask for crimes involving an argument:

https://examples.cloudant.com/crimes/_design/lucenegeoblog/_search/findcrimes?q=type:Argue

This query returns 13 rows:

{"total_rows":13,  "bookmark":"g1AAAAEWeJzLYWBgYMlgTmFQTElKzi9KdUhJMjTUy00tyixJTE_VS87JL01JzCvRy0styQEqZUpkSLL___9_VgaTmwNPqnMDUCzRFKRfAa7fEo_2JAcgmVQPM4H3rS3YBB00F5jgMSKPBUgyNAApoCn7wcYIioY-ABmjQYJHIMYcgBiD6h-jLADMN1fM",
  "rows":[
    {"id":"79f14b64c57461584b152123e38a58ca","order":[4.2708353996276855,0],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e38ec546","order":[4.2708353996276855,40],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e38c4ce8","order":[3.740839958190918,13],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e3908811","order":[3.740839958190918,38],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e39108e1","order":[3.740839958190918,44],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e38b11d4","order":[3.549445152282715,8],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e38b5c12","order":[3.549445152282715,10],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e38e2803","order":[3.549445152282715,31],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e38e7cbf","order":[3.549445152282715,39],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e3905861","order":[3.549445152282715,44],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e390f947","order":[3.549445152282715,50],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e390bc77","order":[3.549445152282715,51],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e3912dab","order":[3.549445152282715,53],"fields":{"type":"Argue"}}
  ]
}

Which would look like this if plotted on a map:
Some Boston crimes

Now, say we want to organize the results by proximity to a local bar we think may be a problem. We know the coordinates of this bar, so we can use a clever sort parameter to accomplish our goal in this new query:

https://examples.cloudant.com/crimes/_design/lucenegeoblog/_search/findcrimes?q=type:Argue&sort="<distance,long,lat,-71.07505979,42.32865671,mi>"

This returns the same 13 rows, but take a look at the ids. The order is now different.

{"total_rows":13,
  "bookmark":"g1AAAAEmeJzLYWBgYMlgTmFQTElKzi9KdUhJMjTSy00tyixJTE_VS87JL01JzCvRy0styQEqZUpkSLL___9_Fpjj5iA578XsvIjgROMskBkKcDMs8BiR5AAkk-qRTOF5cLvueLNbIm8WmktM8BiTxwIkGRqAFNCk_TCjOM-udxWQPpzIgG4UPk9BjDoAMQruKsHexVECphqJOllZAFqPX6Q",
  "rows":[
    {"id":"79f14b64c57461584b152123e3908811","order":[0.0,38],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e38ec546","order":[0.46176565188522095,40],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e38b5c12","order":[0.9774288003583641,10],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e39108e1","order":[1.399243473889131,44],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e38e2803","order":[1.4297353780528468,31],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e3912dab","order":[1.674393221777318,53],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e38b11d4","order":[1.7185707796811796,8],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e390f947","order":[2.1562546799337228,50],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e38a58ca","order":[3.225431956819621,0],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e38c4ce8","order":[3.6097936539303275,13],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e38e7cbf","order":[3.7522872699576357,39],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e3905861","order":[4.388318450202213,44],"fields":{"type":"Argue"}},
    {"id":"79f14b64c57461584b152123e390bc77","order":[6.405184200868535,51],"fields":{"type":"Argue"}}
  ]
}

Now we can pay more attention to the crimes at the top of the list, and not waste time looking at crimes far from the bar. This doesn’t seem like a big deal with 13 results, but if we were using the full crime database, which has almost half a million crimes, optimizations like this are crucial.

Another way to restrict our search to a small area around the bar would be to add a geospatial bounding box (or rectangular ‘fence’) to the query, limiting responses to documents whose longitude falls between -71.08 and -71.04 and whose latitude falls between 42.28 and 42.32. Let’s also throw an include_docs=true parameter in the query so we can see all the information in the document.

https://examples.cloudant.com/crimes/_design/lucenegeoblog/_search/findcrimes?q=type:Argue AND long:[-71.08 TO -71.04] AND lat:[42.28 TO 42.32]&sort="<distance,long,lat,-71.06,42.30,mi>"&include_docs=true

I won’t reproduce the entire response here, but it contains only 7 rows. It worked!

You’ve glimpsed the power of combining basic geospatial queries with Lucene’s extraordinary text search capabilities. The possibilities are truly endless. Comment here to let us know how you use it, and you could be a future guest star here on our blog.

Join The Discussion

Your email address will not be published. Required fields are marked *