Imagine you have a database of geo-located data: churches, houses, people, Pokémon–whatever. You need to be able to quickly return the nearest items to a given point (latitude and longitude). How do you do it? Let’s say we have some data like this:

name latitude longitude
NW89AY 51.5321531084 -0.17797924481
NW89AZ 51.5338736618 -0.17759333684
NW89BD 51.533115598 -0.17799848667

where each of 1.7 million rows represents a UK postcode and the latitude and longitude of its centre.

Abbey Road

How would we find the nearest postcode to our current location? Here are a few methods:

  1. We could just go through each of the 1.7 million records in turn calculating how far each one is to us and picking the smallest distance. This works for one-off queries but is computationally expensive and is not a suitable solution if we want great performance.
  2. We could put the data into a relational database and index the longitude column. This would allow us to select vertical slices of data to reduce the amount of calculations we would need to do. This helps a little but is not an optimal solution, as it only reduces the data size in one axis.
  3. The ideal solution is build a two-dimensional index using both the latitude and longitude. PostgreSQL has geospatial functions as has the IBM Cloudant database, but for this example we’re going to use Redis.

Why Redis?

Redis is an in-memory database making read and write operations blindingly quick. It provides simple commands that you can use like building blocks to assemble your own data structures, including some geospatial functions that use two-dimensional indices built from supplied latitude and longitude data. Redis converts the coordinates into geohashes which when sorted in an index, lets nearby objects be close together in storage.

Data can be added into a Redis geospatial index using its command-line shell (redis-cli) using the GEOADD command:

 GEOADD postcodes -0.17797924481 51.5321531084 NW89AY

This adds a postcode NW89AY at lat/long 51.5/-0.2 to an index called postcodes. We can then query the index to see which of the postcodes is nearest to our location:

 GEORADIUS postcodes -1.0 54.4 400 km
1) "NW89AY"

This returns the nearest postcode. Although we have only added one postcode into the index at this point!

Finding the source data

In the UK, the Ordnance Survey publishes tons of data including the Code Point Open data set which provides a list all of the postcodes in the UK, where they are on the map, and which adminstrative regions they belong to. After downloading the data, we find we have 121 separate CSV files whose contents are of this form:

"AB101AA",10,394251,806376,"S92000003","","S08000020","","S12000033","S13002483"
"AB101AB",10,394235,806529,"S92000003","","S08000020","","S12000033","S13002483"
"AB101AF",10,394181,806429,"S92000003","","S08000020","","S12000033","S13002483"
"AB101AG",10,394251,806376,"S92000003","","S08000020","","S12000033","S13002483"
"AB101AH",10,394371,806359,"S92000003","","S08000020","","S12000033","S13002483"
"AB101AL",10,394330,806529,"S92000003","","S08000020","","S12000033","S13002483"
"AB101AN",10,394296,806581,"S92000003","","S08000020","","S12000033","S13002483"
"AB101AP",10,394309,806459,"S92000003","","S08000020","","S12000033","S13002483"

We are interested in the 1st, 3rd, and 4th columns. The rest we can ignore.

Importing data in bulk

It turns out the Ordnance survey prefers to publish its coordinates as eastings and northings in a coordinate system which best fits the geography of the UK. Converting these coordinates into WGS84 latitude and longitudes requires some pretty hairy maths, but luckily Hannah Fry’s blog post distils this down into some Python code we can repurpose. I did so and posted tolatlng.py for you to download into the folder where you saved the .csv files.

Modifying Hannah’s code slightly, we can pipe in all the CSV files in one go and output a stream of Redis GEOADD commands to be saved to a text file, or even better, to be sent directly into Redis:

cat *.csv | ./tolatlng.py | redis-cli

It took 12.5 minutes to import the 1.7 million postcodes to a local Redis database on my Mac, most of this time being taken to convert the coordinates in Python. We can make this even more efficient by converting the Redis commands into the lower-level Redis protocol. Download toredis.py into the folder where you saved the data files. Then we can pipe this stream of data into redis-cli --pipe instead. Using --pipe means that Redis expects Redis protocol input and only replies once at the end of import, not once after each command. This is particularly important when importing data to a remote Redis instance:

cat *.csv | ./tolatlng.py | ./toredis.py | redis-cli --pipe

Querying the index

Once imported into Redis, the index occupies just over 200Mb of RAM and is ready to query with the GEORADIUS command:

 GEORADIUS postcodes -1.2 54.5 5 km WITHDIST COUNT 10
1) 1) "TS80AN"
   2) "0.2072
2) 1) "TS80AJ"
   2) "0.5106"
3) 1) "TS89EB"
   2) "0.7001"
4) 1) "TS70NU"
   2) "0.8965"
5) 1) "TS80AH"
   2) "0.9330"
6) 1) "TS80AW"
   2) "0.9340"
7) 1) "TS80AL"
   2) "1.0324"
8) 1) "TS89RZ"
   2) "1.1061"
9) 1) "TS80AD"
   2) "1.1364"
10) 1) "TS80AQ"
   2) "1.1449"

which translates to “find the nearest 10 postcodes to this lat/long (within a 5km radius), and also calculate the distance”. Redis performs this search in less than 10 milliseconds.

map

Every Redis command comes with an indication of its complexity and the GEORADIUS command is listed as:

O(N+log(M)) where N is the number of elements inside the bounding box of the circular area delimited by center and radius and M is the number of items inside the index.

In other words, if you want great performance, keep the radius as small as possible: the more points that the circle covers, the slower the query.

Running on Redis by Compose

Compose offers fault-tolerant Redis clusters as-a-service in a choice of data centers. You can use your local redis-cli command-line tool to interact with your Compose Redis cluster by copying hostname, port, and password credentials from your Compose dashboard:

cat *.csv | ./tolatlng.py | ./toredis.py | redis-cli -h sl-us-dal-9-portal.99.dblayer.com -p 10000 -a MYPASSWORD --pipe

Links

Join The Discussion

Your email address will not be published. Required fields are marked *