While Javascript usage continues to grow at a rapid pace, including server side with Node.js, Javascript developers have not been able to directly work with distributed computing engines such as Apache Spark which does not provide a Javascript API. The EclairJS project changes that by providing an API, and enables developers to use Apache Spark’s large-scale analytics, streaming, SQL and machine learning features by writing only Javascript.

EclairJS consists of two components. One component is the EclairJS server called EclairJS-Nashorn which sits in front of Apache Spark and enables the second component, the EclairJS client which is a Node module called EclairJS-Node, to communicate with Apache Spark. Apache Spark is built on top of the Java Virtual Machine and the EclairJS-Nashorn component allows us to extend the various parts of Apache Spark to support Javascript natively.


simple-arch

In a production environment the IT department would probably setup an Apache Spark cluster with the EclairJS server sitting next to the Apache Spark Master. But in lieu of the IT folks and to help you try out EclairJS, we have created a Docker image containing an Apache Spark and EclairJS server setup that you can deploy on IBM Bluemix. With this setup running on Bluemix, it is a simple matter to also run a Node application in Bluemix and make it communicate with Apache Spark in the Docker container. Note that Bluemix provides a 30-day free trial (no credit-card needed) so you can try out EclairJS in Bluemix at no cost.

In the rest of this post, I’ll first describe how to setup Docker and the Docker image containing Apache Spark and the EclairJS server in Bluemix. Then I’ll go on to describe a simple web application and how to deploy it onto Bluemix.

Setting up Docker

Docker containers provide a convenient way to package pre-built environments that can be quickly deployed, either locally on your development machine or to hosted solutions like Bluemix. You will need to have Docker running locally as we will be using it to push the container to Bluemix. Follow
this page to install Docker locally on your development machine and use the Getting Started Tutorial if you don’t already have Docker installed (version 1.10 or higher is needed).

You will also need to download the Bluemix command line tools as well as the Cloud Foundry command line tools (Bluemix is built on top of Cloud Foundry). Next you will need to install the appropriate IBM Containers Cloud Foundry plug-in for your operating system (follow step 5 on that page).

Bluemix provides every user with a private Docker registry where they can store Docker images, and you will use your registry to store a copy of the Spark-EclairJS image. Before you can use your registry, you need to give it a unique name. To do this from your command line, log into Bluemix and set the registry name:

$ cf login
$ cf ic namespace set your_registry_name_here
$ cf ic login

Setting Up Spark

With Docker set up, we can push the Spark-EclairJS Docker image to the registry by executing the following commands, and be sure to substitute in the registry name you set above:

$ cf ic init
$ cf ic cpi eclairjs/minimal-gateway registry.ng.bluemix.net/<your_registry_name_here>/minimal-gateway

This one-time operation builds and pushes the image to your Bluemix repository. Next we create a Docker container based on this image, called eclairjs/minimal-gateway, which will provide a working environment for us to use. ((If you are unfamiliar with the differences between Docker images and containers, see here).)

$ cf ic run --name eclairjs  -p 8888:8888 -m 128 registry.ng.bluemix.net/<your_registry_name_here>/minimal-gateway

This operation creates a Docker container named eclairjs based on the image. It may take a while to complete but you can check on the status of container by running the following command:

$ cf ic ps
CONTAINER ID        IMAGE                                                                      COMMAND             CREATED              STATUS                   PORTS               NAMES
879ada59-5e6        registry.ng.bluemix.net/<your_registry_name_here>/minimal-gateway:latest   ""                  About a minute ago   Running 27 seconds ago   8888/tcp            eclairjs

You can also monitor the state of the Docker container by going to Bluemix’s Dashboard which will list all of the containers you have created. Once the status of the Spark-EclairJS container is Running, you need to request a public IP address for it because Bluemix applications cannot connect to Bluemix Containers using their private IP addresses. The command to request a public IP address is:

$ cf ic ip request
OK
The IP address "<your.ip.address>" was obtained.

The command will output an IP address. We bind that address to our Docker container so we can access it with the command:

$ cf ic ip bind <your.ip.address> eclairjs
OK
The IP address was bound successfully.

Building a Simple Spark Application

Now that we have a Spark instance running on Bluemix we can start developing our Node program. For our example we will create a Bluemix application that uses Node to provide a webpage with a button that executes a simple Spark program. The full source code for this application can be found on Github.

A Bluemix Node application is very similar to any other Node application, with two differences. One, there is a manifest.yml file in the root of the project that allows us to control an application’s environment by specifying parameters such as memory and disc quotas. Two, Bluemix provides environment variables that define the host and port to listen on so that the application can be accessed.

As with any Node program we start with a package.json file, and we define our main file as index.js:

{
  "name": "bluemix-simple-template",
  "version": "0.0.1",
  "scripts": {
    "start": "node --harmony index.js"
  },
  "main": "index.js",
  "dependencies": {
    "express": "~4.12.0",
    "eclairjs": "*"
  }
}

The index.js file (located in the same directory as package.json) is shown in the next two code snippets, and it contains our backend logic and the setup for Express which we use to provide a simple web frontend for the example. The only Bluemix specific part here are the evironment variables VCAP_APP_HOST and VCAP_APP_PORT that are passed to Express:


// index.js 

// use express for a webserver
var express = require('express');

var port = process.env.VCAP_APP_PORT;
var host = process.env.VCAP_APP_HOST;

// setup the express server
var app = express();
app.use(express.static('public'));

var server = app.listen(port, host, function () {
});

We now use Express to create an endpoint /do which will execute a very simple Apache Spark program. In this case we use Spark to parallelize an array of numbers, which will distribute the data among the Spark slave nodes. This will return an Resilient Distributed Dataset (RDD) that represents the data stored across all the nodes. We then multiply each number in the RDD by 2 using the map operator, which traverses all the data and generates a new RDD. After that we collect the data from the nodes and send it back to the browser:


// index.js, continued

var eclairjs = require('eclairjs');

// our main entry point
app.get('/do', function (req, res) {
  var spark = new eclairjs();
  var sc = new spark.SparkContext("local[*]", "Simple Spark Program");

  var rdd = sc.parallelize([1.10, 2.2, 3.3, 4.4]);

  var rdd2 = rdd.map(function(num) {
    return num * 2;
  });

  rdd2.collect().then(function(results) {
    sc.stop();
    res.json({result: results});
  }).catch(function(err) {
    sc.stop();
    res.json({error: err});
  });
});
}

We also create a public/index.html file to contain our frontend. It provides a simple form with a button that when pressed will call the /do endpoint using an XMLHTTPRequest, and then output the results to the web page.

<!-- public/index.html -->

<html>
<head>
  <title>EclairJS Bluemix Example</title>
  <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css" integrity="sha512-dTfge/zgoMYpP7QbHy4gWMEGsbsdZeCXz7irItjcC3sPUFtf0kuFbDz/ixG7ArTxmDjLXDmezHubeNikyKGVyQ==" crossorigin="anonymous">

  <script>
    function _loadListener() {
      document.getElementById("result").innerHTML = this.responseText;
      // renable the run button
      document.getElementById("runBtn").removeAttribute("disabled");
      document.getElementById("runMsg").style.display = "none";
      document.getElementById("finishedMsg").style.display = "inline";
    }

    function doExample() {
      // disable the run button
      document.getElementById("runBtn").setAttribute("disabled", "true");

      document.getElementById("finishedMsg").style.display = "none";
      document.getElementById("runMsg").style.display = "inline";

      var xhr = new XMLHttpRequest();
      xhr.addEventListener("load", _loadListener);
      xhr.open("GET", "/do");
      xhr.send();
    }
  </script>
</head>
<body>
  <div class="container">
    <h2>EclairJS Bluemix Example</h2>
    <div>
      <button id="runBtn" type="button" onclick="doExample()" class="btn btn-primary">Run</button>
      <div style="display: inline-block; margin-left: 8px;">
        <span id="runMsg" class="btn status bg-info">Running...</span>
        <span id="finishedMsg" class="btn status bg-success">Finished</span>
      </div>
    </div>
    <div id="result"></div>
  </div>
</body></html>

Deploying to Bluemix

The final step is to deploy your application to Bluemix. To do this, login to your Bluemix account, go to the Dashboard and click on Create App. Choose Web for the type of application, select the SDK for Node.js bundle, and give your application a name during the rest of the setup. Once you have completed these steps on Bluemix, you will need to edit the manifest.yml file and change the value of the name parameter to the application name you just specified in Bluemix. In addition, change the value of JUPYTER_HOST to be the public IP address you previously assigned to the Spark-EclairJS container.

applications:
- path: .
  memory: 1024M
  instances: 1
  domain: mybluemix.net
  name: Your_Application_Name
  disk_quota: 1024M
  env:
    JUPYTER_HOST: YOUR.IP.ADDRESS

Bluemix is now ready to host the actual code of your application, so you can push your application to Bluemix by executing the following three commands in the directory where the application exists:

$ bluemix api https://api.ng.bluemix.net
$ bluemix login
$ cf push
...
requested state: started
instances: 1/1
usage: 1G x 1 instances
urls: yourappname.mybluemix.net
last uploaded: Sat Aug 13 02:18:39 UTC 2016
stack: unknown
buildpack: SDK for Node.js(TM) (ibm-node.js-4.4.7, buildpack-v3.6-20160715-0749)

     state     since                    cpu    memory        disk          details
#0   running   2016-08-12 07:19:36 PM   0.1%   85.6M of 1G   72.4M of 1G

Pushing the application will cause Bluemix to install any dependencies listed in the package.json file. Bluemix will list the result of the push and if everything went as planned it will show the state of the application as running.

You are now ready to access your application from a browser. The application’s public URL is provided as the value of the urls: field that was returned when you pushed the application, and it should have the form yourappname.mybluemix.net. Assuming you can access the URL, clicking the blue Run button will call the /do endpoint, execute the Spark code, and output the results on the page as shown in the screenshot below.


screenshot

Bluemix allows Node applications to easily scale, and EclairJS now allows Node developers to directly harness the large-scale data processing power of Apache Spark. Visit the EclairJS project to find out more and try out our examples including Spark Streaming and Machine Learning as well as our application examples.

2 Comments on "Running Apache Spark Applications With Node.js on IBM Bluemix"

  1. … [Trackback]

    […] Read More here: developer.ibm.com/node/2016/08/25/running-apache-spark-applications-with-node-js-on-ibm-bluemix/ […]

Join The Discussion

Your email address will not be published. Required fields are marked *