Earlier this year, we released a new version of the Simple Data Pipe application. This app lets you load data from the source of your choice directly into Cloudant. You just create a data pipe configuration and run it.
The Simple Data Pipe app is a framework to create, modify, delete, and run data pipe configurations. When an app user chooses a pipe configuration (like source: Salesforce, and dataset: case) and runs it, the Simple Data Pipe framework invokes a data-source-specific connector (in this case, the Salesforce connector) to perform the actual data movement. The connector interprets the configuration and moves the appropriate data into Cloudant.
A data pipe configuration contains information about the data source, authentication information, and source data set information. A pipe configuration depends upon the connector and the choices a user makes.
Connectors handle data movement from the cloud data source to Cloudant, by:
- connecting to the source using OAuth (if secure access is required)
- retrieving the requested data sets
- optionally enriching them with data from other sources, and
- storing the results as JSON documents in Cloudant for later processing
The Simple Data Pipe app ships with built-in connectors for Salesforce and Stripe. Additional connectors are available and you can deploy them as add-ons, providing access to a variety of data sources, like Reddit, Slack, and Trello.
We developed the connectors that exist so far to facilitate our own data analysis projects. As part of this work, we updated the Simple Data Pipe to make it easier to build new custom connectors for other popular data sources.
Cloud data source authentication
Connectors can now take advantage of the popular Passport authentication middleware for Node.js to establish secure connectivity with data sources. This eliminates the need to manually implement the entire OAuth authentication flow. Take a look, for example, at the Slack connector. To implement authentication, we
- added the
passport-slackstrategy as a module dependency,
- configured the strategy, and
- specified the OAuth scopes required by the Slack API calls (fetch list of channels and fetch messages in channel) we intended to use.
With hundreds of strategies to choose from, chances are good that there's one for the data source you need. If there isn't one yet, why not implement it yourself and publish on GitHub?
Jumpstarting connector development using Boilerplates
To make it even easier for you to get started, we created a couple connector boilerplates for popular cloud data sources. These boilerplates have authentication support baked-in, which lets you focus on what's important: data retrieval. Check out our connectors page to see the list.
Data retrieval and enrichment
The Simple Data Pipe framework does not impose any restrictions on how to fetch, manipulate, and optionally enrich data. Browsing through our catalog, you'll see that some connectors use vendor-provided API libraries (like stripe.com), some use third-party API wrapper libraries (like this lightweight one for slack), and some call the REST API endpoints directly via HTTP(S) requests.
Data storage and output format
When you start a data pipe run, the Simple Data Pipe app automatically creates a dedicated Cloudant database for each data set the connector processes. At runtime, the Simple Data Pipe framework provides the connector with a callback to be invoked whenever individual records or sets of records need to be written to the Cloudant database. There are no constraints as to what structure records have to use—you can pick whatever makes the most sense in the context of how the data will be consumed. For example, our connector for Reddit flattens data structures to support processing by the Spark-Cloudant connector, whereas others preserve the data structures returned by the API.
Simple Data Pipe connectors can simply load data from the cloud data source (like the salesforce.com connector). Or they can be smarter and combine fetched data with information obtained from other cloud data sources to provide a value-added service, as in the following two examples:
- The social media connector for Reddit uses Watson Tone Analyzer to gauge the tone of user comments. A complete use-case scenario based on this data is nicely illustrated in Chetna's blog post.
- The connector for flightstats.com combines flight status information with weather data for the departure airport.
What do you think? Does the Simple Data Pipe sound like a something that would streamline some of your projects?
Try it out: Deploy the Simple Data Pipe app, and load data with a built-in connector. Next, deploy an add-on connector. If we've won you over by then, go whole-hog and create a custom connector of your own! Let us know how it goes. We'd love to hear from you and collaborate on GitHub.