IBM Developer Advocacy

Introducing the Simple Autocomplete Service



Glynn Bird
5/31/16

We have all seen auto-complete on web forms. The field label is Town. We start typing “M” then “a” and before we know it, a pull-down list has appeared suggesting some words that begin with the letters we’ve typed:

demo gif

The more characters we type, the smaller the list gets and we can click on the correct town name at any time.

Websites build such tools in one of two ways:

  1. The entire data set is transferred to the web page and autocomplete happens within the browser
  2. No data is transferred to the browser; each keypress triggers a search for matching items on a server-side API

The first solution is best for small data sets, but when the list of possible values is larger (say hundreds, thousands, or even millions of options), then the client-server approach is much more efficient. It is this second scenario that the Simple Autocomplete Service is built to cover.

What is the Simple Autocomplete Service?

The Simple Autocomplete Service is a Node.js web app built with the Express framework that lets you upload multiple data sets to a cloud service which then operates a fast and efficient autocomplete API. Later in this article, we’ll see how it works in detail, but the tl;dr is: it uses a Redis in-memory database to store and index the data.

Here are some example API calls from a deployed Simple Autocomplete Service instance that has been locked down so that it is now read-only:

  • https://simple-autocomplete-service.mybluemix.net/api/countries?term=bo
  • https://simple-autocomplete-service.mybluemix.net/api/presidents?term=W

Notice how the urls show the two individual data sets that have been uploaded (countries and presidents). The search string that you want to lookup is supplied as a term parameter. The API can then be plumbed into a webpage to provide auto-complete on a form control.

You can run the application locally in conjunction with a local Redis instance or deploy to the IBM Bluemix platform-as-a-service with a connected Redis by Compose service.

Installation

Click this button

Deploy to Bluemix

to deploy the app to Bluemix, IBM’s cloud development platform. If you don’t yet have a Bluemix account, you’ll be prompted you to sign up.
(You’ll also find this button in the Simple Autocomplete Service source code repository.)

Upon deployment, you’ll get an error saying that deployment failed. No worries! It didn’t really. It just requires Redis.

deployment failed!

Click the APP DASHBOARD button and click your new Simple Autocomplete Service to open it.

To add Redis:

  1. In a new browser tab, head over to https://www.compose.io/ and sign up for an account there.
  2. Hit Create Deployment then choose Redis and wait for a cluster to be created for you.
  3. On the Getting Started page that appears, click reveal your password and leave this page open. You’ll come back for these Redis credentials in a moment.

    compose credentials

  4. Head back to Bluemix and where you have your Simple Autocomplete Service open.

  5. Click ADD A SERVICE OR API and choose Redis by Compose.

    Redis By Compose on Bluemix

  6. Enter your credentials as follows:

    • For Username enter only x
    • In Password enter your Redis service password.
    • For Public hostname/Port enter the string that appears in the TCP Connection String box after the @ character, replacing the : character with / as illustrated:

    redis credentials

    When you enter these credentials, your completed form looks something like this:

    credentials in service form

  7. Click Create.

  8. When prompted, click Restage.

    When the app is done staging, click its URL to launch and see the service in action.

Uploading data

Find or create a file with your own data. It should be a plain text file with one text string per line, like:

William
Mary
John

From the menu on the left, click Create an index, enter an Index name, and click the Upload button.

Upload screenshot

Scroll up to Current Indexes and in a few seconds you see your new index in the list. Try a few auto-completes by typing letters in the Test box. You can add as many indexes as you need (or until you run out of Redis memory).

try it screenshot

The Simple Autocomplete Service is really an API service. You can try the API call directly in a new browser window, just visit the URL of this form:

https://MYAPP.mybluemix.net/api/MYINDEX?term=a

replacing MYAPP with your application domain and MYINDEX with the name you chose when you created the index.

Locking down the service

When you’re happy with your data, you can lock down the Simple Autocomplete Service so that it becomes a read-only API. Simply add a custom environment variable to your Bluemix app called “LOCKDOWN” with a value of “true”. Your application will restart and only the autocomplete API will function.

bluemix lockdown

Integrating with your own forms

The Simple Autocomplete Service is CORS-enabled, so it should be simple to plumb it into your own forms. If you have an HTML page with jQuery and jQueryUI in, you can create an auto-complete form with a few lines of code:

<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <script src="https://code.jquery.com/jquery-1.10.2.js"></script>
  <script src="https://code.jquery.com/ui/1.11.4/jquery-ui.js"></script>
  <link rel="stylesheet" href="https://simple-autocomplete-service.mybluemix.net/css/sas.css" />
  <script>
    $(function() {
      $("#mycontrol").autocomplete({
        source: "https://simple-autocomplete-service.mybluemix.net/api/countries",
        minLength: 1,
        delay:0
      });
    });
  </script>
</head>
<body>
  <div class="ui-widget">
    <input id="mycontrol">
  </div>
</body>
</html>

The anatomy of the Simple Autocomplete Service

First principles

Redis is chosen as the database for this task because it stores its data in memory (it is flushed to disk periodically). In-memory databases are extremely fast and the auto-complete use-case requires high performance because the use of the web form will expect a speedy reponse to the keypresses they make.

The heart of our autocomplete service is the data that is uploaded. Any text file containing one line per value should be fine e.g.

.
.
.
Mabel
Mabelle
Mable
Mada
Madalena
Madalyn
Maddalena
Maddi
Maddie
.
.
.

One solution to find matches from this data is to store the values in a list and scan every member for matches when performing an autocomplete request.
This solution is fine for small data sets but as it involves scanning the whole collection from top to bottom to establish a list of matches it becomes increasingly inefficient as the data size increases. It is said to have a O(N) complexity, because the effort required to perform the search increases linearly with the size of the data set (N).

In a blog post from 2010 the creator of Redis, Salvatore Sanfilippo, discusses a more efficient solution which involves pre-calculating the possible search strings and placing them into an “sorted set” data structure in Redis. Sorted sets are usually used for ordering keys by value (e.g. a high-score table), but in this case it keeps our candidate search strings in alphabetical order. The solution outlined in the blog post is used in a slightly modified form in the Simple Autocomplete Service, with our sorted set containing keys made up of combinations of possible letter combinations:

.
.
"m"
"ma"
"mab"
"mabe"
"mabel*Mabel"
"mabell"
"mabelle*Mabelle"
"mabl"
"mable*Mable"
.
.

Some features of the data to notice:

  • this index occupies more space than a simple list of the complete values
  • the keys are stored in alphabetical order
  • the keys are lowercased and filtered for punctuation before saving for a predictable, case-sensitive match
  • at the end of each sequence of keys we store the unaltered original key using the notation mabelle*Mabelle, with the original unfiltered string placed after the asterisk. This allows the service to access the original string in its original case.
  • the keys are not repeated – there is only one key for “ma” despite several names starting with “ma” to save space in the index
  • the method of storage is most efficient on large data sets with lots of repetition at the starts of words

Importing the data

The Simple Autocomplete Service adds strings to the Redis database using the ZADD command to create a sorted set:

> ZADD myindex 0 "m"
> ZADD myindex 0 "ma"
> ZADD myindex 0 "mab"
> ZADD myindex 0 "mabe"
> ZADD myindex 0 "mabel*Mabel"

The zero in the syntax above is the score of the sorted set. We set all the strings to have the same score so that only alphabetical ordering takes place.

Querying the data

When we wish to find the auto-complete solutions for the string ma, we need to find our way to ma in our Redis index and then retrieve a number of keys that occur after that point in the index.

In Redis, we use two queries to do this

  1. ZRANK to find the place in the index that matches our search string
  2. ZRANGE 75 to find the 75 lines that occur in the index from that point on. 75 is number hard-coded into the service to return a reasonable number of solutions to the query.

e.g.

> ZRANK myindex ma
(integer) 7429
> ZRANGE myindex 7429 7504
 1) "ma"
 2) "mab"
 3) "mab*Mab"
 4) "mabe"
 5) "mabel"
 6) "mabel*Mabel"
 7) "mabell"
 8) "mabelle*Mabelle"
 9) "mabl"
10) "mable*Mable"
11) "mad"
12) "mada"
13) "mada*Mada"
14) "madal"
15) "madale"
16) "madalen"
17) "madalena*Madalena"
18) "madaly"
19) "madalyn*Madalyn"
20) "madd"
.
.
.

The service only keeps the keys with an asterisk in the middle (the complete answers) and then returns those values to the user:

["Mab","Mabel","Mabelle","Mable"]

As the index is stored in order by Redis, the ZRANK function is an O(log n) operation, meaning that its complexity only increases in proportion to the logarithm of the data size (N). The ZRANGE query is similarly efficient so the amount of work required to perform a search using the ZRANK/ZRANGE technique remains almost constant whatever the data size.

How many strings would we need have in our text file before the ZRANK/ZRANGE solution out-performs scanning a linear list? The answer is less than 100. It’s a no-brainer; the indexed solution wins in all but the very simplest cases.

Homework

As it stands, the Simple Autocomplete Service only matches strings that begin with search phrase. What if I wanted to match on the second word of a phrase? Imagine I indexed actors names:

Molly Ringwald
Judd Nelson
Paul Gleason
Anthony Michael Hall
Ally Sheedy
Emilio Estevez

I want autocomplete to work when I type “A..L” as well as if I type “S..H”.

That would involve indexing additional data

a
al
all
ally
ally s
ally sh
ally she
ally shee
ally sheed
ally sheedy*Ally Sheedy
s
sh
shee
sheed
sheedy*Ally Sheedy

The index would be bigger in this case, but it should work. If anyone would like to modify the source code repository and send me a pull request, I’d be happy to incorporate this as an option.

blog comments powered by Disqus