IBM BigInsights on Cloud provides Apache Hadoop-as-a-service on IBM’s SoftLayer global cloud infrastructure. You can spin up Hadoop clusters – designed specifically for storing and analyzing huge amounts of unstructured data in a distributed environment – in minutes without the cost or complexity of managing your own infrastructure. These seven videos will provide you with a jumpstart into using BigInsights on Cloud.

Understand the basic plan

In this video, you get an overview of the new Basic Plan for BigInsights on Cloud to demonstrate how you can develop, test, and deploy Hadoop and Apache Spark apps. You’ll learn about the underlying IBM Open Platform, how to associate the apps with object storage accounts, how the system uses REST calls to interact with Swift, and how the virtual data and manager nodes run on Docker. You’ll also get a glimpse of the Bluemix cluster-manager interface.

Connect to object storage

Bluemix will help you create service credentials for object storage when you set up the data store. The video will show you how to create a storage container, after which you can upload and view data in the store using the Bluemix service dashboard. You can reference object store data in Hadoop commands using the Swift object store URL.

Set up a BigInsights cluster

Watch and learn how to name and provide access security for your cluster, as well as how to configure data storage amounts, number of nodes, the version of Open Platform you want to use, and the options to associate a specific cloud data store with your cluster. Automatically, components such as HDFS, HBase, Ambari Metrics, Yarn, Zookeeper, Hive, MapReduce2, and Knox will be installed and you will get an opportunity to choose some optional components like Spark, Pig, and Sqoop.

Your cluster will be associated with your object store.


The next three videos will demonstrate various ways to use your BigInsights on Cloud cluster:

  • How to access and run basic examples from GitHub
  • How to use Sqoop to integrate data from Compose for MySQL
  • How to process BigInsights data using Python and spark-submit

How to access and run basic examples from GitHub

You’ll learn how to access and run the examples and to employ a “smoke test” (or, build verification testing in which several of the examples run together).
The Github examples.

How to use Sqoop to integrate data from Compose for MySQL

Sqoop is a command-line interface app that transfers data between relational databases and Hadoop and supports incremental loads of a single table or a free-form SQL query.

How to process BigInsights data using Python and spark-submit

Finally, you will see how to use the data you imported with Sqoop to run a Spark job using Python and spark-submit.


developerWorks Connect

developerWorks Connect

Tutorials, demos, tips, how-to guides, and discussions with technical experts in software development, by developers, for developers.

View more episodes of developerWorks Connect

Get email notifications for new episodes of dW Connect

Join The Discussion

Your email address will not be published. Required fields are marked *