Lab 1: Overview and getting started

In this hands-on lab, you’ll learn how to work with Big Data using Apache Hadoop and InfoSphere BigInsights 3.0, IBM’s Hadoop-based platform. In particular, you’ll learn the basics of working with the Hadoop Distributed File System (HDFS) and see how to administer your Hadoop-based environment using the BigInsights Web console. After launching a sample MapReduce application, you’ll explore a more sophisticated scenario involving social media data. In doing so, you’ll learn how to use a spreadsheet-style interface to discover insights about the global coverage of a popular brand without writing any code. Finally, you’ll learn how to apply industry standard SQL to data managed by BigInsights through IBM’s Big SQL technology. Indeed, you’ll have a chance to create tables and execute complex queries over data in HDFS, including data derived from a relational data warehouse.

Ready to get started?

After completing this hands-on lab, you’ll be able to:

• Work directly with Apache Hadoop through file system commands

• Inspect and administer your cluster through the BigInsights Web Console

• Explore big data using a spreadsheet-style tool

• Use Big SQL to create tables and issue complex queries

Allow 2 ½ – 3 hours to complete all labs associated with Exploring Hadoop and BigInsights. Allow 15 minutes to complete the exercises in this section, plus additional time to download and unzip the required VMware image.

This lab was developed by Cynthia M. Saracco, IBM Silicon Valley Lab. Please post questions or comments about this lab or the technologies it describes to the forum on Hadoop Dev at https://developer.ibm.com/answers?community=hadoop.

1.1.About your environment

This lab was developed for the InfoSphere BigInsights 3.0 Quick Start Edition VMware image. If necessary, download and install the single-node cluster VMware image from this site: http://www-01.ibm.com/software/data/infosphere/biginsights/quick-start/downloads.html

The VMware image is set up in the following manner:

User

Password

VM Image root account

root

password

VM Image lab user account

biadmin

biadmin

BigInsights Administrator

biadmin

biadmin

Big SQL Administrator

bigsql

bigsql

Lab user

biadmin

biadmin

Property

Value

Host name

bivm.ibm.com

BigInsights Web Console URL

http://bivm.ibm.com:8080

Big SQL database name

bigsql

Big SQL port number

51000

.

image001

About the screen captures, sample code, and environment configuration

Screen captures in this lab depict examples and results that may vary from what you see when you complete the exercises. In addition, some code examples may need to be customized to match your environment. For example, you may need to alter directory path information or user ID information.

1.2. Getting started

To get started with the lab exercises, you need to install and launch the VMware image as well as start the required services.

__1. If necessary, obtain a copy of the BigInsights 3.0 Quick Start Edition VMware image from IBM’s external download site (http://www-01.ibm.com/software/data/infosphere/biginsights/quick-start/downloads.html). Use the image for the single-node cluster.

__2. Follow the instructions provided to decompress (unzip) the file and install the image on your laptop. Note that there is a README file with additional information.

__3. If necessary, install VMware player or other required software to run VMware images. Details are in the README file provided with the BigInsights VMware image.

__4. Launch the VMware image. When logging in for the first time, use the root ID (with a password of password). Follow the instructions to configure your environment, accept the licensing agreement, and enter the passwords for the root and biadmin IDs (root/password and biadmin/biadmin) when prompted. This is a one-time only requirement.

image002

image003

__5. When the one-time configuration process is completed, you will be presented with a SUSE Linux log in screen. Log in as biadmin with a password of biadmin.

image004

__6. Verify that your screen appears similar to this:

image005

__7. Click Start BigInsights to start all required services. (Alternatively, you can open a terminal window and issue this command:$BIGINSIGHTS_HOME/bin/start-all.sh

image006

Wait until the operation completes.This may take several minutes, depending on your machine’s resources.

__8. Verify that all required BigInsights services are up and running. From a terminal window, issue this command:$BIGINSIGHTS_HOME/bin/status.sh.

__9. Inspect the results, a subset of which are shown below. Verify that, at a minimum, the following components started successfully:hdm, zookeeper, hadoop, catalog, hive, bigsql, oozie, console, and httpfs.

image007

Now you’re ready to start working with big data!

To find the other tutorials in this series, go to Overview tutorial.

image008

If have any questions or need help getting your environment up and running, visit Hadoop Dev (https://developer.ibm.com/hadoop/) and review the product documentation or post a message to the forum.

You cannot proceed with subsequent lab exercises until you’ve logged into the VMware image and launched the necessary BigInsights services.

2 comments on"Overview Lab 1: Getting started with Hadoop and BigInsights"

  1. Justin Murray * October 28, 2016

    A question: does IBM have other images of BigInsights that come in VMware – such that I could deploy a multi-node Hadoop cluster in multiple virtual machines from an example template? Thank you.

  2. the download link is not working, please update it accordingly

Join The Discussion

Your email address will not be published. Required fields are marked *