couchdb-horizontal-logo-1

IBM Cloudant has open sourced our Search stack, which powers the Apache® Lucene™ integration that comprises Cloudant’s full-text search system. In this article we will prepare and boot an Apache® CouchDB™ 2.0 development cluster with integrated search features. CouchDB 2.0 brings clustering and many more nice features, so let’s look at how we can combine CouchDB with the Apache Lucene text search engine library.

The Search stack consists of two additional repositories: Dreyfus and Clouseau.

We will need to build CouchDB from source. If you already have experience building CouchDB from source, you can skip most of the following setup section, but you will have to install Java 6 and Maven 3.2.5. You can see all changes that we have to make at https://github.com/cloudant/couchdb/tree/article-cloudant-com-dreyfus, which is also quite helpful if you get stuck.

Set Up CouchDB

We will need to clone CouchDB:

git clone https://github.com/apache/couchdb
cd couchdb
git checkout 2.0.0

To compile CouchDB on OS X you will need to have the OS X command-line tools installed. (If you don’t use OS X, please read https://github.com/cloudant/couchdb/blob/article-cloudant-com-dreyfus/INSTALL.Unix.md You can install them via Terminal:

xcode-select --install

You will also need to install these dependencies (brew on OS X is used here):

brew install autoconf
brew install autoconf-archive
brew install automake
brew install libtool
brew install erlang
brew install icu4c
brew install spidermonkey
brew install curl
brew install pkg-config
brew install haproxy

Install Maven 3.2.5 And Java 6

Additionally we have to install Maven and Java 6. You can get Java 6 from: https://support.apple.com/kb/DL1572

You will also need Maven 3.2.5 in your PATH. You can get it from: http://mirror.23media.de/apache/maven/maven–3/3.2.5/binaries/apache-maven–3.2.5-bin.tar.gz For installation instructions, see https://maven.apache.org/install.html

After we have installed everything we will need to check out the CouchDB source and run:

./configure --disable-fauxton --disable-docs
make

CouchDB is split into smaller sub-repositories. The ./configure kicks off rebar, which will pull each sub-repo down into the folder src.

When the compile is successful, we can test if CouchDB runs:

./dev/run --with-admin-party-please

In another Terminal window we can use curl to test if CouchDB answers:

(17:49:10) [robert@tequila-work] ~ $ curl localhost:15984
{"couchdb":"Welcome","version":"0fdc50b","vendor":{"name":"The Apache Software Foundation"}}

Yay! Looks good! We can now stop the script at ./dev/run.

Build CouchDB with Search

This section explains how to make small changes to CouchDB in order to integrate Dreyfus into the build. It provides some background information on why we have to make these changes. All code is also located at https://github.com/cloudant/couchdb/tree/article-cloudant-com-dreyfus so in case you are stuck or want to use a shortcut, just check out the branch article-cloudant-com-dreyfus from https://github.com/cloudant/couchdb and proceed to the Recompiling CouchDB section here below.

The dependencies for the Erlang parts of CouchDB are defined in CouchDB’s rebar-config-file. We have to add Dreyfus to it:

diff --git a/rebar.config.script b/rebar.config.script
index c194f3f..740d2ae 100644
--- a/rebar.config.script
+++ b/rebar.config.script
@@ -58,7 +58,8 @@ DepDescs = [
 {rexi,             "rexi",             "a327b7dbeb2b0050f7ca9072047bf8ef2d282833"},
 {snappy,           "snappy",           "0ab2796f82789895a2a86d403e63f3823d3c5a1d"},
 {setup,            "setup",            "d0a9b722485639fc43ccbfc4267f3a2dd9aa9d5a"},
-{meck,             "meck",             {tag, "0.8.2"}}
+{meck,             "meck",             {tag, "0.8.2"}},
+{dreyfus,           {url, "https://github.com/cloudant-labs/dreyfus"}, "5f113370a1273dd1bdc981ca3ea98767bca0382d"}
 ],

 BaseUrl = "https://git-wip-us.apache.org/repos/asf/", 

We also have to add Dreyfus to our reltool.config located at rel/reltool.config:

diff --git a/rel/reltool.config b/rel/reltool.config
index 7699f3e..c209507 100644
--- a/rel/reltool.config
+++ b/rel/reltool.config
@@ -59,7 +59,8 @@
         oauth,
         rexi,
         setup,
-        snappy
+        snappy,
+        dreyfus
     ]},
     {rel, "start_clean", "", [kernel, stdlib]},
     {boot_rel, "couchdb"},
@@ -116,7 +117,8 @@
     {app, oauth, [{incl_cond, include}]},
     {app, rexi, [{incl_cond, include}]},
     {app, setup, [{incl_cond, include}]},
-    {app, snappy, [{incl_cond, include}]}
+    {app, snappy, [{incl_cond, include}]},
+    {app, dreyfus, [{incl_cond, include}]}
 ]}.

 {overlay_vars, "couchdb.config"}.

Additionally we have to register the Dreyfus extensible plugin interface (EPI) in rel/apps/couch_epi.config:

diff --git a/rel/apps/couch_epi.config b/rel/apps/couch_epi.config
index a07ae2a..86ddfeb 100644
--- a/rel/apps/couch_epi.config
+++ b/rel/apps/couch_epi.config
@@ -17,5 +17,6 @@
     global_changes_epi,
     mango_epi,
     mem3_epi,
-    setup_epi
+    setup_epi,
+    dreyfus_epi
 ]}.

Get the Queryserver working with Dreyfus

The Queryserver from CouchDB has to learn how to handle the views that create a search index. In order to make it work we have to add dreyfus.js to our Queryserver:

curl https://raw.githubusercontent.com/cloudant/couchdb/c323f194328822385aa1bb2ab15b927cc604c4b7/share/server/dreyfus.js > share/server/dreyfus.js

The build then must include our new dependencies for the Queryserver:

diff --git a/support/build_js.escript b/support/build_js.escript
index 5050fd6..47c69cc 100644
--- a/support/build_js.escript
+++ b/support/build_js.escript
@@ -26,6 +26,7 @@ main([]) ->
                "share/server/state.js",
                "share/server/util.js",
                "share/server/validate.js",
+               "share/server/dreyfus.js",
                "share/server/views.js",
                "share/server/loop.js"],

@@ -36,6 +37,7 @@ main([]) ->
                    "share/server/state.js",
                    "share/server/util.js",
                    "share/server/validate.js",
+                   "share/server/dreyfus.js",
                    "share/server/views.js",
                    "share/server/coffee-script.js",
                    "share/server/loop.js"],

Now that we have added dreyfus.js we have to add the exposed functions to loop.js to be able to call them in a view:

diff --git a/share/server/loop.js b/share/server/loop.js
index e1226c3..692eacb 100644
--- a/share/server/loop.js
+++ b/share/server/loop.js
@@ -28,6 +28,7 @@ function init_sandbox() {
     sandbox.send = Render.send;
     sandbox.getRow = Render.getRow;
     sandbox.isArray = isArray;
+    sandbox.index = Dreyfus.index;
   } catch (e) {
     //log(e.toSource());
   }
@@ -127,7 +128,8 @@ var Loop = function() {
     "add_lib"  : State.addLib,
     "map_doc"  : Views.mapDoc,
     "reduce"   : Views.reduce,
-    "rereduce" : Views.rereduce
+    "rereduce" : Views.rereduce,
+    "index_doc": Dreyfus.indexDoc
   };
   function handleError(e) {
     var type = e[0];

Talking to Clouseau

Our node running Dreyfus must know where Clouseau is running in order to communicate with it. One Erlang node will talk to one Clouseau instance. From reading the source, Dreyfus gets that information from the CouchDB config files.

A config entry for node1 for a dev cluster running on localhost would look like this:

[dreyfus]
name = clouseau1@127.0.0.1

For our dev cluster, we have to create a template for each node, later ./dev/run will fill in the right values for us:

diff --git a/rel/overlay/etc/local.ini b/rel/overlay/etc/local.ini
index 58baefd..0be138c 100644
--- a/rel/overlay/etc/local.ini
+++ b/rel/overlay/etc/local.ini
@@ -106,3 +106,6 @@
 ; changing this.
 [admins]
 ;admin = mysecretpassword
+
+[dreyfus]
+name = {{clouseau_name}}

For a default three-node dev cluster, the script that boots the cluster conveniently creates and modifies the configuration files. So let’s inject our Dreyfus config here:

diff --git a/dev/run b/dev/run
index 2cb1fd7..cb77ade 100755
--- a/dev/run
+++ b/dev/run
@@ -171,7 +171,8 @@ def setup_configs(ctx):
             "node_name": "-name %s@127.0.0.1" % node,
             "cluster_port": cluster_port,
             "backend_port": backend_port,
-            "fauxton_root": "src/fauxton/dist/release"
+            "fauxton_root": "src/fauxton/dist/release",
+            "clouseau_name": "clouseau%d@127.0.0.1" % (idx+1)
         }
         if os.name == 'nt':
             # Erlang always wants UNIX-style paths

Recompiling CouchDB

We are almost there. Let’s recompile CouchDB with our latest additions:

./configure --disable-fauxton --disable-docs
make
./dev/run --with-admin-party-please

We now have a three-node CouchDB 2.0 cluster running with Dreyfus integrated!

Booting Clouseau

In a separate Terminal window run:

git clone https://github.com/cloudant-labs/clouseau
cd clouseau
mvn scala:run -Dlauncher=clouseau1

Maven will download our dependencies and build everything we need. After Maven has finished, open two other separate terminal windows.

in Terminal 2 run:

cd clouseau/
mvn scala:run -Dlauncher=clouseau2

in Terminal 3 run:

cd clouseau/
mvn scala:run -Dlauncher=clouseau3

Troubleshooting

You might get this exception form Clouseau:

[INFO] launcher 'clouseau1' selected => com.cloudant.clouseau.Main
java.lang.reflect.InvocationTargetException
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
    at scala_maven_executions.MainHelper.runMain(MainHelper.java:164)
  at scala_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)
Caused by: java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
  at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:401)
   at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:370)
   at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:292)
   at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:695)
   at overlock.threadpool.ErrorLoggedThread.run(NamedThreadFactory.scala:40)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------

If you get an exception like this from Clouseau, your CouchDB including Dreyfus is not running. The exception "java.net.ConnectException: Connection refused" which is thrown means that Clouseau is not able to establish a connection to the corresponding Dreyfus node, so just boot it and retry!

Searching!

That’s it! We can now use CouchDB with Search powered by Apache Lucene. Feel free to follow our tuturials for search: https://cloudant.com/for-developers/search/ and https://cloudant.com/blog/search-faceting-from-scratch–2. There is also video tutorial available at https://www.youtube.com/watch?v=IdiCmKINL9g.

© "Apache", "CouchDB", "Lucene", "Apache CouchDB", "Apache Lucene", and the CouchDB and Lucene logos are trademarks or registered trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

The code used in this article is licensed under Apache License, Version 2.0, January 2004. For details, see: https://github.com/apache/couchdb/blob/master/LICENSE.

3 comments on"Enable Full Text Search in Apache CouchDB"

  1. Hi! I tried to follow the instructions but it crashes when it comes to:

    ./configure –disable-fauxton –disable-docs

    I get at the end of the message:
    Cloning into ‘docs’…
    fatal: repository ‘https://git-wip-us.apache.org/repos/asf/couchdb-documentation.git/’ not found
    ERROR: sh(git clone -n https://git-wip-us.apache.org/repos/asf/couchdb-documentation.git docs)
    failed with return code 128 and the following output:
    Cloning into ‘docs’…
    fatal: repository ‘https://git-wip-us.apache.org/repos/asf/couchdb-documentation.git/’ not found

    ERROR: ‘get-deps’ failed while processing /Users/myuser/couchdb2cloudant/couchdb: rebar_abort

    Do you have an idea what is wrong please?

    • Hi wassx, saw your comment and tried to find the author of this post but unfortunately, it looks like he is no longer with IBM. I will see if I can find someone else to help you out…

      Ron

  2. I was able to get it up & running, but I’, facing two issues:

    1) Replication isn’t working and I have no idea why.
    2) I would like to run `make release` to generate the compiled files to an output instead of just `make`. However, it looks like full-text search is working only works if I run couchdb from ./dev/run, not from a compiled directory generated by `make release`

Join The Discussion

Your email address will not be published. Required fields are marked *