You completed Part 1: Metrics Collection portion of this tutorial, but what can you do with the data you collect? The reason we capture metrics is to understand how users engage with our apps. The next step is to analyze usage data in ways that uncover new insights.

Here in part 2 of the tutorial, we use Cloudant’s secondary indexing engine to aggregate JSON, the format of the metrics data that we’ve persisted. Then we pump that data into D3 for visualization and analysis. You can find all the code for this part of the tutorial in the metrics-analytics GitHub repo.

UI Overview

Begin by familiarizing yourself with the user interface. Visit our demo metrics visualization app. It shows usage reports for our developer.ibm.com/clouddataservices site’s How Tos page.

Metrics analytics D3 app
Metrics analytics D3 app

Exploring the UI

  1. On the left side of the screen, click the report By Events – total.
  2. Select a time period to view.

    On the upper left, above the report, click the date picker to select a date range for your query. Some relative options are built in for you, like Last 7 days and Last 30 days. Or you can enter a custom range.

  3. Choose how you want to see the data.

    Above the report, in the button bar, select the type of visualization for the data. You can choose a bar or pie chart, or see the raw data in a table.

  4. Switch between applications or pages.

    The drop-down list on the top right lets you select the application for which you want to view reports. As described in Part 1, you can collect metrics for several pages or apps. A unique ID (siteid) identifies each page or app, which is included in the tracking script tag like this:

        <script src="//metrics-collector.mybluemix.net/tracker.js" siteid="cds.search.engine"></script>
        

  5. These widgets immediately generate requests to get an aggretated view of the data, returning data as JSON, which is the format D3 uses to generate the chart in the center of the page.

Architecture Overview

Let’s look under the covers and see how it works. Part 1 of our tutorial showed how we were tracking usage of dynamically generated webpage elements and persisting the JSON for future analysis. Here in Part 2, we’re going to show you how to use the MapReduce indexing engine in Cloudant or Apache CouchDB™ to create materialized views and aggregate the data. We then pass the aggregation to D3, which renders our data in the browser as an SVG image that humans can analyze. Analytics!

While we won’t cover all the UI development details here, the interface is built with Bootstrap and AngularJS so our frontend is up and running quickly.

Here’s an overview of what our Simple Metrics app does:

Simple Metrics Analytics Architecture
Simple Metrics analytics architecture

Application Setup

Cloudant and CouchDB contain a simple Web server in addition to their database-clustering capabilities. The Web server lets us deploy our application as a “CouchApp.” From the CouchDB documentation: “CouchApps are web applications served directly from CouchDB, mostly driven by JavaScript and HTML5. If you can fit your application into those constraints, then you get CouchDB’s scalability and flexibility ‘for free’ …” It also provides a convenient way to write MapReduce views and store them in the database at the same time — a feature we use in this tutorial.

To build and deploy a CouchApp, run the following in the command line:

$ sudo easy_install --upgrade couchapp

NOTE 1: Easy Install is a python module (easy_install) that lets you automatically download, build, install, and manage Python packages. You probably already have this installed on your system, but if not, get it here.

NOTE 2: The drawback to CouchApps is that there is no middle tier between the client and the database. This means that any authentication credentials you need for the database, like passwords or API keys, must reside in the client — or you must give clients a password to the database itself — so the CouchApp strategy is inherently insecure and unfit for production apps. But they’re great for example apps (like this one) because we can share the website and its data through CouchDB-style replication. Just make sure you use the Python flavor of CouchApp, as there are a few different strategies for laying out the files. Only the Python style referenced here will work for this tutorial.

Materialized Views via MapReduce Indexing

Cloudant and CouchDB borrow the MapReduce programming model, typically associated with distributed systems like Apache Hadoop™, and adapt it to build secondary indexes defined in JavaScript. Secondary indexes are often called “database views” and are akin to materialized views; however, they are persisted to disk, update incrementally as JSON changes, and come with built-in reduce functions for aggregation.

Read more about MapReduce in the Cloudant API documentation.

Now, let’s revisit the final product (a bar chart like the one below) and investigate how we’re processing the JSON that powers it.

Bar chart of events by type
D3 bar chart of events by type

The chart shows user events grouped by the following action types:

  • link counts clicks on links in the page. There were about 8 within the period specified.
  • pageView tracks visits to our How-Tos page. There were about 60 within our time period.
  • search counts searches that users executed on the page. There were about 35 searches.

Now for the JSON. When a viewer visits the site and performs actions, they generate an event document in the database, which looks like this:

{
  "type": "search",              //Type of event being captured (currently pageView, search and link)
  "idsite": "cds.search.engine", //app id (must be unique)
  "ip": "75.126.70.43",
  "url": "http://cloudant-labs.github.io/resources.html",   //the source
  "geo": {
    "lat": 42.3596328,
    "long": -71.0535177
  }
  "search": "",         //Search text if any (specific to search events)
  "search_cat": [       //Faceted search info (specific to search events)
    {
      "key": "topic",
      "value": "Analytics"
    },
    {
      "key": "topic",
      "value": "Data Warehousing"
    }
  ],
  "search_count": 7,    //search result count (specific to search events)
  "action_name": "IBM Cloud Data Services - Developers Center - Products", //Document title (specific to pageView events)
  "link": "https://developer.ibm.com/bluemix/2015/04/29/connecting-pouchdb-cloudant-ibm-bluemix/",  //the target
  "rec": 1,             //always 1
  "r": 297222,          //from the base Piwik library, random number, we don’t use this value
  "date": "2015-5-4",
  "h": 16,
  "m": 20,
  "s": 10,
  "$_id": "0e9dcf4b6b5b0dc7", //cookie visitor
  "$_idts": 1433860426,       //cookie visitor count
  "$_idvc": 2,          //Number of visits in the session
  "$_idn": 0,           //Whether a new visitor or not
  "$_refts": 0,         //Referral timestamp
  "$_viewts": 1433881201,  //Last Visit timestamp
  "$_ref": '',          //Referral url
  "send_image": 0,      //used image to send payload
  "uap": "MacIntel",
  "uab": "Netscape",
  "pdf": 1,             //browser features: supports pdf, QuickTime, etc.
  "qt": 0,
  "realp": 0,
  "wma": 0,
  "dir": 0,
  "fla": 1,
  "java": 1,
  "gears": 0,
  "ag": 0,
  "cookie": 1,
  "res": "3360x2100",
  "gt_ms": 51           //from the base Piwik library, config generation time, we don’t use this value
}

Sample tracking event JSON document

To create the D3 chart, we need the following information from this JSON document: The type field tells us what type of event occurred. Since our app lets us track events by idsite, we also need that. For setting the time period, we need the date field too.

Let’s look at the MapReduce code to see how the database view extracts these values from all the documents in our database. If you haven’t already, clone the metrics-analytics GitHub repo. Open the views folder. You’ll see a bunch of other folders, each of which is a database view. Each of these folders contains two files:

  • map.js which stores the map function
  • reduce.js (optional) which contains the reduce function

The map function runs first. Open the grouped_events folder to find the functions that aggregate information for our Events by Type bar chart example. The following screenshot shows the map function code in the upper left, and the output of the map function in the table below.

Group by Events Map function
Map function for “Events by Type” chart

MapReduce functions in Cloudant and CouchDB are JavaScript functions that iterate over every document in the database. The map function in this example checks a document for the required fields, and if they are present, it creates a key-value pair where the key is an array consisting of idsite, the date (which we convert to separate numeric values for year, month and day), and the event type. When the map function finds these details, the value associated with this newly created compound key is 1. For instance, this map function found 3 documents that have the idsite: cds.search.engine, date: 2015-5-1, and are of type: search.

Now let’s look at the reduce function in grouped_events. It simply calls one of Cloudant’s/CouchDB’s built-in reduce functions, in this case, sum(values). This function groups every unique combination of idsite, date, and type and adds up the values produced by the map function (1). The aggregated output is then the frequency of each unique compound key in the data set. For the example in the previous paragraph, the reduce function output value is 3.

Group by Events Reduce function
Reduce function for “Events by Type”

Now that we’ve aggregated stats on our data set, we have a nice compact chunk of JSON that we can easily send to the Web browser for visualization with D3. For each type of report you want to offer, you need to create a database view. Take a look at the other views we’ve written, like events_by_platform. Generating new views on your data is very straightforward, but note that they are persisted to disk for efficient traversal and incremental updating of new or changing JSON.

Visual Analytics with D3 and JSON

We bring all the pieces of the puzzle together in a single-page app for visualization. It uses Bootstrap for layout and leverages AngularJS to make smooth state transitions when users switch between different reporting options.

The Code

It’s best if you read the code, but we’ll go over some high-level concepts here.

Notice the <html> tag is a little out of the ordinary, it has an attribute ng-app="visualizationApp". This tells AngularJS to look for a module called visualizationApp (which lives in app.js) and wire up visualizationApp to the node and every node descended from it. In our case that means the entire webpage. The other important connection between our webpage and AngularJS is the <div> on Line 30 that has an attribute, ng-controller="vizModel", which tells AngularJS to set the scope of the vizModel controller to that node and every node descended from it. This way you can have different code control different parts of the page, but in our case we only have a single controller.

So where does the actual chart creation happen? Find the selectVisualization function (look for the line $scope.selectVisualization = function(visualization){). We access the view with the command:

couchApp.db.view( design + '/' + viewName,options)

This command gets the data from the view and passes it to the proper sub-function of getTotalEventsChartBuilder for rendering with D3. You can study the renderChart, renderTable, renderLine and renderPie sub-functions to see how we make graphics out of the view data.

Wrap

That wraps it up for Part 2 of this tutorial. I hope this whirlwind tour of MapReduce views and D3 visualization gave you a good feel for all the potential analytics you can develop on top of a Cloudant data store. If you modify the sample code and want to publish your own version, you’ll want to get up to speed on deploying CouchApps and you’ll probably also want to install CouchDB for local development and testing.

We touched on many areas we could explore further. For instance, there are a lot of slick options to control CouchApps with JavaScript. And playing around with MapReduce views or working on the charts could be a whole separate tutorial. Finally, we could have skipped the CouchApps approach and built an app that got the view data by making calls to the Cloudant/CouchDB REST API. We would have had to write more code, but it would be more modular and probably easier to integrate into an enterprise environment.

Let us know if you’re interested in learning more about any of these topics. Meanwhile, fork the repo and build something great. We can’t wait to see what you’ll create.


Like Simple Metrics Analytics?

© “Apache”, “CouchDB”, “Hadoop”, “Apache CouchDB”, “Apache Hadoop”, and the CouchDB and Hadoop logos are trademarks or registered trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

Join The Discussion

Your email address will not be published. Required fields are marked *