Today, we’re introducing a refactored and streamlined Simple Data Pipe, our open-source data movement project. While the workflow for piping data has changed, the new architecture opens up more free options for data movement onto, or off of, the IBM cloud.
Why change The Pipe?
Services are changing rapidly on IBM’s Bluemix application platform. As these services evolve, we wanted to create a more modular Simple Data Pipe that could better deal with new features and brand new products.
If you’re already using the Simple Data Pipe, don’t fear. We can still move data to dashDB, IBM’s cloud data warehouse. I’ll cover the mechanics of analytics workflows later on. For now, let’s look at The Pipe’s new architecture and our motivations behind it.
A simpler data pipe architecture
It’s all about getting data. The big problem the Simple Data Pipe solves has always been about sourcing data from disparate Web APIs. The Pipe captures that data in its native structure, and persists it in a database that’s flexible enough to adapt to your plans for processing it.
The new Simple Data Pipe no longer assumes that you plan to process data for a particular use (analytics), in a particular place (dashDB). We’ve modularized the architecture of The Pipe by separating the step of landing data in Cloudant from the step of moving data to a different, more specialized place. Here’s an “annotated” architecture diagram:
Instead of automating the process of moving data from
REST sources → Cloudant → dashDB, the new Simple Data Pipe is scoped more narrowly to
REST sources → Cloudant and ends the process there. It’s a cleaner, more modular approach that we believe better handles the rate of innovation in the Bluemix ecosystem and makes the data pipe more useful to applications beyond analytics use-cases.
What the Pipe has lost in push-button, end-to-end data movement, it has gained in flexibility. Also, it still allows for future implementations that do move data end-to-end, whenever free APIs are available for analytics engines like IBM’s Apache Spark service, warehouses like dashDB, and other tools.
More options for your next move
For users who are focused on analytics use-cases, the new Simple Data Pipe can still connect to dashDB, although that connection is no longer baked in. It’s now a separate step completed in Cloudant. While this roster will expand, here is the current set of options for moving data out of Cloudant:
- dashDB, via native Cloudant integration with dashDB. Finish movement using Cloudant’s web dashboard.
- Apache Spark, via native Cloudant integration with Bluemix’s Spark service. Finish movement by calling the Cloudant connector in a Spark Scala Notebook.
- DataWorks, enterprise-grade APIs for data shaping & movement. A paid service on Bluemix as of February 2016. Provision DataWorks on Bluemix first, before deploying the new Simple Data Pipe.
When compared to the previous version of the Simple Data Pipe — aside from a streamlined architecture — we’ve removed The Pipe’s dependence on DataWorks. Connecting the DataWorks APIs to the data pipe is still an option, but by removing this dependency, Cloudant can provide more options for data movement.
Where to get the new Pipe
The same place as always on our developerWorks site. There you’ll find links to our GitHub repos and other instructions. In the coming weeks weâ€™ll be updating content to reflect the new Simple Data Pipe. Weâ€™ll also kick off a new series of tutorials that shows all the ways you can work with the Data Pipe’s additional targets.
Let’s get that data moving, y’all.