EclairJS provides JavaScript developers with an API to Apache Spark so they can take advantage of Spark’s scalable streaming, SQL, machine-learning, graph capabilities.

See all Open Source projects

Many web app developers use Node.js and code their applications in JavaScript. In addition to having one of the richest programming environments in terms of the number of available packages, Node.js can scale to handle very large numbers of simultaneous requests. To process all these requests, Node hands off any significant processing to other engines that are optimized for particular compute tasks.

Apache’s Spark is one such engine—although it is unique in that it is actually optimized for several different types of compute tasks. In particular, Apache Spark has:

  • An SQL engine
  • A component for querying and analyzing streaming data
  • A rich set of Machine Learning (ML) algorithms
  • An engine for querying and manipulating graph representations of data such as social networks

Moreover, these capabilities are integrated so that you can, for example, run queries against streams of data. Last but not least, Spark is fast because it has parallelized operators which work on data kept in memory whenever possible, and it is massively scalable from your laptop to thousands of nodes in distributed clusters.

To take advantage of Spark from JavaScript, you should simply be able to “npm install” an appropriate module, require that module in your Node application, and then write JavaScript code including Spark operators and objects that act just like other JavaScript constructs. You can’t do this with Spark as it arrives out-of-the-box. When you install EclairJS, you can incorporate Apache Spark into your web applications as described.

EclairJS is particularly suited to applications that are “chatty”, such as user-facing and interactive applications that use input from users to drive analyses or other operations that are performed in Spark. Of course, the ultimate case of “chatty” applications are those that handle streams of data, such as applications that continuously update visualizations like maps or graphs. EclairJS uses web sockets to communicate between Node applications and Spark, and these are ideally suited to handle streaming data.

EclairJS is actually composed of two components. One is a client that is the module that is installed by “npm install eclairjs.” It is the interface which is called in JavaScript from the web application. The second component is the EclairJS Server that is located on a Spark cluster. The Server does most of the heavy lifting in terms of handling the conversions to and from JavaScript and the existing Java API of Spark. The project has been through several iterations of this conversion process and our informal measurements suggest that EclairJS’s performance is close to Spark’s native Java performance.

In addition to providing a server for the EclairJS Client, the server can also support other applications. For example, the Server can be a Kernel for Jupyter Notebooks and enable users to run JavaScript in their Notebooks.

We hope you find EclairJS useful, and we are very keen on recruiting new contributors and learning about new use cases!

Why should I contribute?

Spark is one of the fastest growing projects at Apache, and it is quickly making inroads into many businesses and enterprises. By becoming involved in the EclairJS project now, you can get a jump on this new technology, and be prepared to make use of its wide range of functionalities from within your existing Node.js web app development environment.

In addition, EclairJS strives to mirror as much of the Spark API as possible. Admittedly, the project’s existing contributors have their own biases and will prioritize those parts of the API that are most important to them. So, if there is a particular Spark operation or set of operations that would be useful for your project, it may be quicker to contribute them to the project now rather than wait for today’s contributors to implement them.

What technology problem will I help solve?

Spark has a number of different language interfaces already built-in, namely Scala, Java, Python plus some support for R. However, it is missing any built-in capability to support typical web app development platforms such as Node.js.

The EclairJS project fixes this lack by providing Spark with a JavaScript API that covers the full range of Spark’s capabilities. Moreover, it surfaces Spark in a way that is completely natural for the JavaScript developer — just define variables and functions as you would normally but now have them cover Spark constructs as well.

How will EclairJS help my business?

Web app and server-side coding are typically divided between two different groups of developers. Modern web apps are most often coded in JavaScript on a platform like Node.js. Server-side coding is often done in Java. Being a JVM-based system, Spark has tended to re-enforce that distinction. However by providing a Spark API for JavaScript developers they can now code applications from “front-to-back” in the same language. This should save time (and money) in the development process, both because JavaScript is generally considered to be a simpler language than Java, and because of not having to co-ordinate the work of separate development teams.

EclairJS blog posts