Linhao Xu is a software developer for IBM Cloudant, with over 16 years of database experience. His expertise includes index and query processing in peer-based data management systems and spatial-temporal tracking of moving objects. Prior to earning a Ph.D. in computer science from Fudan University, he won APWEb’s Best Student Paper award. He served on the program committee for DASFAA 2006, P2PSearch 2007, and as Web Chair of SIGMOD 2007.

Nearest Neighbor (NN) is the most common distance query developers implement to help users do location-based search, like finding the nearest post office or bank and showing it on a map. Specifically, given a coordinate P, a NN query of P returns the object whose coordinate is closest to P. The k Nearest Neighbor (kNN) query returns the k objects that are closer to P than others.

Cloudant Geospatial offers kNN query as a feature, and this blog covers a couple different ways to perform a kNN query in your apps.

Geospatial kNN Query Options in Cloudant

First off, as a developer, you need to know that there are two different options to perform geospatial kNN query in Cloudant:

  • Cloudant Geo offers the most flexible geospatial kNN query capability. You can perform a geospatial kNN query by radius and standard GeoJSON geometries (e.g., linestring and polygon), rather than by point or rectangle. However, so far, you cannot query kNN objects with any other non-geospatial attributes at the same time.

  • Cloudant Search combines geospatial kNN query with non-geospatial attribute text search. For example, a policeman may search a crime database to find which k criminal events occurred last week close to the Chinatown metro station. However, you cannot perform a geospatial kNN query against complex geometries like a polygon.

So far, Cloudant measures distance (including kNN) in an “as the crow flies” straight line, not in network distance, like the street path you'd follow.

Here, I use the Boston crime dataset to demonstrate how to make geospatial kNN queries. Refer to our previous blog on Geospatial Query with Cloudant Search for the details of the crime dataset (and some more background on geo searching with Cloudant).

Geospatial kNN Query in Cloudant Geo

First, we build a geospatial index with Cloudant Geo and then demonstrate three kNN query examples on the indexed Boston crime dataset.

Tip: You can run all the query URLs in your browser.

Index Boston Crime Dataset

To index the Boston crime dataset, create a design document for Cloudant Geo by using the pre-defined keyword st_indexes to hold one or more Cloudant Geo index definitions, where each index must be defined by st_index function.

For example, in the following design document, the value of the attribute index is a JavaScript function that first checks if doc.geometry and doc.geometry.coordinates exists or not. If true, then system pre-defined st_index function is called to index the geometry object of each document.

{
    "_id": "_design/geodd",
    "st_indexes": {
        "geoidx": {
            "index": "function(doc) { if (doc.geometry && doc.geometry.coordinates && doc.geometry.coordinates.length > 0) { st_index(doc.geometry); }}"
        }
    }
}

Perform geospatial kNN query

Now that you've set the Cloudant Geo index, you can issue a geospatial kNN query against a location (-71.0537124, 42.3681995), to find the five crimes closer to this location. To do so, you must provide a query geometry g=POINT(-71.0537124 42.3681995) that specifies the location and nearest=true that enables nearest neighbor query.

http://examples.cloudant.com/crimes/_design/geodd/_geo/geoidx?g=POINT(-71.0537124 42.3681995)&nearest=true&limit=5&include_docs=true

Cloudant Geo also lets you make a geospatial kNN query against a complex geomery like polygon. To do so, combine both geometric relation and nearest=true. For example, to find the five crimes that occurred nearest, but not within an area in Boston, you can issue the query below where the returned crimes are displayed in the following picture.

http://examples.cloudant.com/crimes/_design/geodd/_geo/geoidx?g=polygon((-71.0537124 42.3681995,-71.054399 42.3675178,-71.0522962 42.3667409,-71.051631 42.3659324,-71.051631 42.3621431,-71.0502148 42.3618577,-71.0505152 42.3660275,-71.0511589 42.3670263,-71.0537124 42.3681995))&relation=disjoint&nearest=true&limit=15&include_docs=true

An example 5NN query on Boston crime dataset

Geospatial kNN query in Cloudant Search

Now, I'll show how to use Cloudant Search to do geospatial kNN query.

Tip: You need to index prior to running a query. To learn how to index the Boston crime dataset, read our previous blog on Geospatial Query with Cloudant Search.

In Cloudant Search, we combine both limit and sort to implement geospatial kNN query. For example, to find the five crimes that are closer to the location of (-71.07505979,42.32865671), we can make a kNN query with limit=5 and sort="<distance,long,lat,-71.07505979,42.32865671,mi>" where the distance is measured in miles.

https://examples.cloudant.com/crimes/_design/lucenegeoblog/_search/findcrimes?q=type:Argue&sort="<distance,long,lat,-71.07505979,42.32865671,mi>"&limit=5&include_docs=true

Our second example is a bit complex and tries to identify the five crimes which are within a specific rectangle of ((-71.08,42.28) (-71.04,42.32)), and closer to the location (-71.06,42.30), and whose type or main_crimecode is Argue.

https://examples.cloudant.com/crimes/_design/lucenegeoblog/_search/findcrimes?q=type:Argue AND long:[-71.08 TO -71.04] AND lat:[42.28 TO 42.32]&sort="<distance,long,lat,-71.06,42.30,mi>"&limit=5&include_docs=true

Summary

Depending on what results you want, choose the geospatial kNN query method that fits best. Use the following summary and comparison table as your guide:

  • Cloudant Geo lets you perform geospatial kNN search by a combination of a query geometry like polygon and nearest=true, with or without additonal geometric relation like disjoint.

  • Cloudant Search lets you perform geospatial kNN search by sort (like sort="<distance,long,lat,-71.06,42.30,mi>"), with or without a rectangle bounding box.

Cloudant Geo Cloudant Search
k (an integer) limit=k limit=k
reference geometry types point, radius, ellipse, rectangle, linestring, polygon point
distance measured by reference geometry the center of the reference geometry specified by sort
geometric relation nearest=true (a must) with other relation like disjoint N.A.

Join The Discussion

Your email address will not be published. Required fields are marked *