Today, as IBM Cloudant, we’re open-sourcing the code repositories behind Cloudant Search: Clouseau and Dreyfus. This code powers Cloudant’s full-text search system, which combines our Apache CouchDB™-based service with the Apache Lucene™ text search engine library.
In an earlier blog post I described the new 2.0 release of Search, which was a ground-up rewrite of the existing search system. It is that code which we’re open-sourcing today.
We intend to contribute Clouseau and Dreyfus to the Apache CouchDB project, pending their approval, but we’re making the code public sooner.
Cloudant Search is two complementary projects:
- Clouseau is Scala code that provides access to the Lucene library. It runs the search queries using Lucene under the hood.
- Dreyfus is written in Erlang and deals with plugging into existing CouchDB/Cloudant clustering technologies. It manages Clouseau nodes to deliver full-text search features.
The two systems communicate using Erlang’s external term format (using the Scalang library on the Clouseau side).
Aside from providing some fun Scala and Erlang to study, we hope this code will prove useful to others building distributed systems that interface with Lucene.
When we donated Mango (the MongoDB-inspired query language interface) to the CouchDB project last year it included integration with Cloudant Search, allowing ad hoc, declarative querying. We will work with the Apache CouchDB development community to enable that integration for users that choose to deploy the open source Cloudant Search feature.
The original release of Mango was very well received so we’re excited to see what people do with this new and significant enhancement.
The benefit of Mango is that users no longer have to understand the structure of different JSON documents before they can issue a query. We introduced this capability recently in the article “Cloudant Query Grows Up to Handle Ad Hoc Queries,” by my colleague Glynn Bird. It’s a fine introduction to the expanded set of parameters and query operators in the updated Cloudant Query.
The Cloudant Search codebase has seen two major versions since its introduction in 2011, and it has four years of production experience behind it. For a look at Cloudant Search over time, see my earlier post about the 2.0 release, this intro presentation by Benjamin Young, and the faceted search tutorial by Alan Hoffman.
Do check out the repos at https://github.com/cloudant-labs/clouseau and https://github.com/cloudant-labs/dreyfus. We encourage people to fork and modify the code for their own purposes. Any pull requests to the code or documentation will be gratefully received, though we only accept enhancements and bug fixes, not new bugs, thanks.
© “Apache”, “CouchDB”, “Lucene”, “Apache CouchDB”, “Apache Lucene”, and the CouchDB and Lucene logos are trademarks or registered trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.