Turning your spreadsheet or mysql.dump into a faceted search engine just got a lot easier. Try out our new Simple Search Service, built to help you create and manage a useful polished search engine for your own site or app.

I’ve blogged before about turning spreadsheet data into a faceted search engine. That tutorial has a few basic steps:

  1. sign up for an IBM Cloudant NoSQL database account
  2. use couchimport to import your spreadsheet data into Cloudant
  3. instruct Cloudant to index the data using a Design Document
  4. perform a Cloudant Search query

If you’re familiar with NoSQL databases and Cloudant or Apache CouchDB in particular, you should find those steps relatively easy to follow. But for someone new to NoSQL, there’s a lot to learn in there before hitting the search API: JSON, command-line tools, design documents, and Lucene query syntax to name just a few.

The Cloud Data Services Developer Advocacy team is always looking to make things as easy as possible. To that end, we are today unveiling the Simple Search Service, which greatly simplifies the steps to turning your tabular data into a faceted search engine.

To try it out, visit the Simple Search Service repository on Github and click the Deploy To Bluemix button. This will install the code in your IBM Bluemix account, connect the services it needs and give you a simple web front-end that lets you import and index your spreadsheet data. (Bluemix has a free trial, so it won’t cost you anything to try out Simple Search Service in the first month.)

What is the Simple Search Service?

Simple Search Service is a Node.js app that you can get and use immediately by deploying to the IBM Bluemix platform-as-a-service with a couple of mouse clicks. Deployment gets you your own working instance of the app, automatically provisions a Cloudant account, attaches it to the service, and presents a web app that lets you upload a data file. When you upload data, it’s automatically imported into Cloudant, with every field indexed for search.

Simple Search Service then exposes a RESTful search API that your application can use. The API is CORS-enabled, so your client-side web app can use it without issue. The API is also cached, meaning that it stores popular searches in an in-memory data store for faster retrieval, giving your application better performance.

Uploading data

The Simple Search Service home page invites you to upload your CSV (comma-separated file) or TSV file (tab-separated file):

Uploading data to Simple Search Service
Uploading a CSV or TSV is easy

Simple Search Service expects the first line of the file to contain the column headings like this:

transaction_id description price customer_name date
42 Pet food 24.22 Jones 2015-04-02
43 Cake 9.99 Smith 2015-04-02

File format must be comma or tab-separated and filenames must end in either .csv or .tsv.

Simple Search Service will accept the following data types:

  • strings
  • numbers
  • booleans
  • arrays of strings (separated by commas)

Records like this:

person_id first_name last_name score passed tags
1 Glynn Bird 45.3 true uk,tall,glasses
2 Mike Broberg 24.1 false us,short,funny

would be turned into the following JSON documents:

  "person_id": "Glynn",
  "last_name": "Bird",
  "score": 45.3,
  "passed": true,
  "tags": ["uk", "tall", "glasses"]
  "person_id": "Mike",
  "last_name": "Broberg",
  "score": 24.1,
  "passed": false,
  "tags": ["us", "short", "funny"]

The values within the score and passed fields are not wrapped in quotation marks. That’s because they’re not strings, they’re numbers and boolean values. Simple Search Service will, in most cases, detect the data types by examining the first few lines of the file but also gives you the opportunity to override.

At this point you may also choose which fields you would like to be “faceted”, by ticking the facet box next to each field:

Selecting fields to facet for Simple Search Service
Specify facets on fields upon data import

Choose fields you’d use to group your data. Faceting counts the occurrences of each field value in a result set. This gives someone searching your data an insight into the composition of the dataset at a glance. The fields you want to facet are usually ones where the values tend to repeat frequently, like these:

  • category names
  • tags
  • enumerations

You can see an example of faceted search results in the guitars
example app for the tutorial I mentioned at the start of this article. The faceted fields (type, range, brand, country, year) appear to the right of the result set and have been programmed to act as secondary filters within the search results.

Example of search facets from the Guitars demo app
What makes a good facet?

Simple Search Service API

The Simple Search Service API is a simplified version of the Cloudant Search API. With Simple Search Service, there are only two parameters:

  • q – the query you wish to perform (default = :)
  • cache – whether to cache search results (default = true)

The API is expecting GET requests to /search e.g. /search?q=brand:fender. Here are some example queries:

  • q=*:* – return everything
  • q=brand:fender – a field search looking for a specific value of the field ‘brand’
  • q=brand:fender OR brand:gibson – a more complicated fielded search with an ‘OR’ clause
  • q=blonde+fender+telecaster – a simple, free-text search

Under the hood, Simple Search Service adds additional parameters to ensure that the document body is returned, that counts of faceted fields are returned, and that the returned JSON is simplified.

Simple Search Service automatically caches all search results for an hour. You can override this behaviour by adding a cache=false parameter to each Simple Search Service API search request.

Using Redis as a cache

By default, Simple Search Service uses an in-memory hash table to cache common search results. This is fine for testing, but if you are going to multiple Simple Search Service nodes then it makes sense to have a centralised cache. Redis is an in-memory database and can be easily integrated into a Simple Search Service installation. To do so:

  1. Sign up for an account at compose.io
  2. Create a Redis cluster and make a note of URL and password of your cluster
  3. In Bluemix, add a Redis by Compose service, ensuring that you name it Redis by Compose — with no appended characters
  4. Configure your Bluemix Redis service with the URL and password from your Compose.io account
Adding Redis by Compose to Simple Search Service for centralised caching of results
Add a centralised cache with Redis by Compose to scale up your deployment

When Simple Search Service reboots, it will detect the attached Redis service and use that for its caching layer.

Try it

See for yourself. Visit the Simple Search Service repository on Github to preview the code, or click the Deploy To Bluemix button below. After it deploys, click the View your app button and upload your data. Happy searching!

Deploy to Bluemix

Join The Discussion

Your email address will not be published. Required fields are marked *