Overview

This tutorial explains how we created a lightweight web-tracking app to record user actions on our site’s search engine page. See how we use the open source Piwik® web analytics app to collect information and Node.js® to store that data in Cloudant. Then try it yourself by implementing tracking on a demo app we provide. Here in Part 1, we focus on data collection. When you’re done, you can try Part 2, where we show how to visualize the data you’ve gathered.

Why we built this app

We had a problem here in the Cloud Data Services Developer Advocacy group. Glynn Bird created a great faceted search engine that we use on our site’s How-To’s page (read Glynn’s tutorial on creating your own faceted search engine).

Our How-To’s page is more sophisticated than a static web page. It uses AJAX to respond to user requests. Instead of refreshing the entire page, we update small parts of the page to show results. This meant that traditional server-side tracking tools that log events wouldn’t help us understand what users are doing in this dynamic context. We also ruled out available client-side tracking services, because they don’t offer full control over what you track, or how data is stored and analyzed. How could we collect and see the user activity data we wanted?

The answer was to create our own app to collect and analyze metrics. We attached link-tracking to the UI elements dynamically generated by our site’s DOM, and we persisted that data to prepare for future analysis.

demo_app
Metrics app

Get deployed

You can preview the demo app to see how it works. But first things first. Here in Part 1, we’ll explain how this app collects metrics.

You can find all the code for Part 1 of this tutorial in the metrics-collector GitHub repo. The easiest way to explore the app is to deploy it to Bluemix (IBM’s open cloud platform for building, running, and managing applications). Open the repo’s README and click the Deploy to Bluemix button. When you click it, Bluemix creates and hosts a copy of the code repository. Thanks, Deploy to Bluemix button!

How it works

Here’s an architectural overview of our metrics collector. Its middleware component serves tracker.js and piwik.js, which perform the metrics collection work and persist metrics data to the database. We use Cloudant as our database, a NoSQL JSON document store based on Apache CouchDB™.

Metrics collector architecture
Metrics collector architecture

Tracking user actions with Piwik

We use the Piwik library to capture search events generated in our web page. Piwik’s JavaScript tracking client offers the ability to capture a host of client-side information, from basics like page views and outbound link clicks, down to the most detailed user events. For us, it captures the search activity by listening to events on the the user interface elements that create a request: the search text box, and the checkboxes for filtering search results.

How-Tos search elements
How-Tos search elements

To connect Piwik to the web page you want to track, all you do is add one simple line to that page. If you view the source of our How-Tos page, you’ll find the script tag include that reads:

<script src="//metrics-collector.mybluemix.net/tracker.js" siteid="cds.search.engine"></script>

That’s all we do in the HTML page we’re tracking—load the tracker.js script and pass it a single variable, siteid, which is a unique identifier that’s saved to the database with every event coming from the How-Tos page.

Tip: You can use this tracking app for any type of web page or app. But, you’re not limited to just one at a time. The “application” identifier is the siteid, so if you use the same siteid on different web pages, their metrics are grouped and analyzed together. (You can still identify the different web pages via the trackPageView Piwik event you’re tracking, and see it in the database as the url key).

To see how event collection works, go to the metrics collector app’s repo, open the js folder and look at the tracker.js file. Two interesting functions are customDataFn, which captures metadata about a user’s browser, and enableLinkTrackingForNode, which facilitates link-tracking for a DOM node and lets us programmatically attach tracking to individual UI elements as they appear.

You can find this line of code in the file cds.js in the search engine GitHub repo.

The point of this client-side event tracking is that every user action on the search engine interface results in an event submission back to the tracker that looks something like this:

Tracking payload URL submission

https://metrics-collector.mybluemix.net/tracker?
   search=&search_cat=[{"key":"topic","value":"Data Warehousing"},
   {"key":"topic","value":"Analytics"}]&
   search_count=7&
   idsite=cds.search.engine&
   rec=1&r=493261&h=17&m=46&s=48&
   url=https://developer.ibm.com/clouddataservices/how-tos/&
   _id=0e9dcf4b6b5b0dc7&
   _idts=1433860426&
   _idvc=2&
   _idn=0&
   _refts=0&
   _viewts=1433881201&
   _ref=https://google.com&
   send_image=0&
   pdf=1&qt=0&realp=0&wma=0&dir=0&fla=1&java=1&gears=0&ag=0&
   cookie=1&res=3360x2100&gt_ms=51&
   uap=MacIntel&
   uag=Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Firefox/31.0&
   date=2015-5-4

Pretty cool so far. We’ve implemented some custom event tracking on our search engine web app. Next, we persist the data so we can do some usage analytics.

Persisting usage data to Cloudant

We’re going to use the Cloudant NoSQL database to store our event data. We do so for a couple reasons:

  • Flexibility. Cloudant stores its data as JSON documents. That format provides schema flexibility that’s a nice fit for the event data.
  • Availability. Cloudant provides high availability read-write access, enabling high levels of concurrent connections, which ensures we never miss user interactions even under heavy load.

To take that tracking payload and persist it to a Cloudant database, we wrote a little Node.js Express app, server.js, which you’ll find in the metrics collector repo. This app accepts the data in an HTTP GET key-value-pair request, transforms it into JSON, and writes it to Cloudant. Here’s a sample JSON document showing how a record is stored in Cloudant:

Structure of a tracking payload document

   {
     "type": "search",              //Type of event being captured (currently pageView, search and link)
     "idsite": "cds.search.engine", //app id (must be unique)
     "ip": "75.126.70.43",          //ip of the client
     "url": "https://developer.ibm.com/clouddataservices/how-tos/",   //source url _for_ the event
     "geo": {                       //geo coordinates of the client (if available)
       "lat": 42.3596328,
       "long": -71.0535177
     }
     "search": "",         //Search text if any (specific to search events)
     "search_cat": [       //Faceted search info (specific to search events)
       {
         "key": "topic",
         "value": "Analytics"
       },
       {
         "key": "topic",
         "value": "Data Warehousing"
       }
     ],
     "search_count": 7,    //search result count (specific to search events)
     "action_name": "IBM Cloud Data Services - Developers Center - Products", //Document title (specific to pageView events)
     "link": "https://developer.ibm.com/bluemix/2015/04/29/connecting-pouchdb-cloudant-ibm-bluemix/", //_target url_ (specific to link events)
     "rec": 1,             //always 1
     "r": 297222,          //random string
     "date": "2015-5-4",    //event date time -yyyy-mm-dd
     "h": 16,              //event timestamp - hour
     "m": 20,              //event timestamp - minute
     "s": 10,              //event timestamp - seconds
     "$_id": "0e9dcf4b6b5b0dc7", //cookie visitor
     "$_idts": 1433860426,       //cookie visitor count
     "$_idvc": 2,          //Number of visits in the session
     "$_idn": 0,           //Whether a new visitor or not
     "$_refts": 0,         //Referral timestamp
     "$_viewts": 1433881201,  //Last Visit timestamp
     "$_ref": 'google.com',//Referral url
     "send_image": 0,      //used image to send payload
     "uap": "MacIntel",     //client platform
     "uab": "Netscape",     //client browser
     "pdf": 1,             //browser feature: supports pdf
     "qt": 0,              //browser feature: supports quickTime
     "realp": 0,           //browser feature: supports real player
     "wma": 0,             //browser feature: supports windows media player
     "dir": 0,             //browser feature: supports director
     "fla": 1,             //browser feature: supports shockwave
     "java": 1,            //browser feature: supports java
     "gears": 0,           //browser feature: supports google gear
     "ag": 0,              //browser feature: supports silver light
     "cookie": 1,          //browser feature: has cookies
     "res": "3360x2100",   //browser feature: screen resolution
     "gt_ms": 51           //Config generation performance generation time
   }

Let’s look at server.js. First, we load in required modules, including one called cloudant (loaded from the file storage.js) that simplifies the process of connecting to a Cloudant database—much the same way the excellent nano library simplifies connecting to an Apache CouchDB database. (Cloudant is, in many ways, an extension of CouchDB.) We set up our database connection in the trackerDb variable initialization and add some secondary indices to it at the same time. (In Cloudant and in CouchDB, secondary indices are defined by JavaScript Map functions.)

Then, we set up Express to serve the static JavaScript files. The following code around line 66 makes any file in the js directory web-accessible via the url http://metrics-collector.mybluemix.net/<filename>:

app.use(express.static(path.join(__dirname, 'js')));

Last but not least, the app accepts event-tracking data on the /tracker endpoint. In app.get("/tracker"... we take the data and use lodash to construct the JavaScript “tracking payload” object shown earlier.

You may have noticed that our Node.js Express app is doing double duty. Not only does it accept requests to save tracking information for persisting to Cloudant, that same app serves out the JavaScript files, tracker.js and piwik.js.

Implement tracking on a sample app

Now, try it for yourself. Test your deployment and implement tracking on a a sample web app.

Clone the sample application

For this test, we’ll use the guitars faceted search engine app written by Glynn Bird.

  1. Copy the app to your local machine.git clone https://github.com/glynnbird/guitars
  2. Add the following tracking script tag to index.html:
    
        <script src="https://code.jquery.com/jquery-1.11.2.min.js"></script>
        <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/js/bootstrap.min.js"></script>
    
    <!-- Add the tracker script tag. Use the deployed url for your instance. You can also pick your own siteid (no need to register it before) -->
    
        <script src="https://my-metrics-collector.mybluemix.net/tracker.js" siteid="test.metric.app"></script>
        <script src="guitars.js"></script>
        <meta name="viewport" content="width=device-width, initial-scale=1"></meta>
    
  3. Edit guitar.js to add the tracking code for dynamically generated content. Locate the following code around line 140 and match what you see here:
      $('#searchtitle').html(html);
      //Reset the tracking for these elements
      if ( typeof _paq !== 'undefined' ){
    	  _paq.push([ enableLinkTrackingForNode, $('#searchtitle')]);
      }
    

    Then around line 52:

      $.ajax(obj).done(function(data) {
        $('#loading').hide();
        if (callback) {
          callback(null, data);
        }
    
        //Track the search results, do not log the initial page load as a search
        if ( searchText !== "" || (filter && $.isArray(filter) && filter.length > 0 ) ){
        	if ( typeof _paq !== 'undefined' ){
        		_paq.push(['trackSiteSearch', searchText, JSON.stringify(filter), data ? (data.total_rows || 0) : 0 ] );
        	}
        }
    

Verify that the events are being recorded

  1. Go to Bluemix and locate your metrics-collector application.
  2. Click metrics-collector-cloudant-service.choose_cloudant
  3. Click the Launch button.
  4. Click the tracker_db database and note the number of docs in the database.
  5. In your favorite browser, launch the guitars index.html.
  6. Search for some guitars and click on a few filters.
  7. Go back to the Cloudant dashboard and reload the page. You’ll see that the number of docs has increased.

You’ve now verified that the metrics collector application is correctly deployed on Bluemix and gathering data. In Part 2 of this tutorial, you’ll see how to represent that data graphically in a report.

Summary of metrics collection

Here in Part 1 of this tutorial, you learned how to use Piwik to collect user actions and persist the data to a Cloudant database. Now you’re ready for Part 2, Metrics Analytics, where you’ll learn how to display that data graphically in a report.


Like Simple Metrics Collector?

© “Apache”, “CouchDB”, “Apache CouchDB” and the CouchDB logo are trademarks or registered trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

Join The Discussion

Your email address will not be published. Required fields are marked *