Introduction

“spark-notebook.io” is an interactive web-based editor that can combine Scala, SQL, Markup and JavaScript with Spark.

Objective

This technical document is intended to show viewers how to install and setup “spark-notebook.io” with Spark on BigInsights cluster.

Version Tested

  • BigInsights v4.1.0.0
  • Apache Zeppelin v0.6.1
  • Apache Spark v1.3.x, v1.4.x
  • RHEL v6.x, RHEL v7.x, CentOS v6.x, CentOS v7.x

**For BigInsights v3.x, please reference post: https://developer.ibm.com/hadoop/blog/2015/08/29/setup-spark-notebook-zeppelin-biginsights/

You can download available binary at:

https://github.com/lindatechie/spark-notebook-biginsights

Step 1: Preparation

  1. Download & Install SBT if not already. (http://www.scala-sbt.org/download.html)
  2. Install SBT:
    • sudo tar zxvf sbt.tar.gz -C /usr/local
    • sudo export SBT_HOME=/usr/local/sbt;export PATH=$PATH:$SBT_HOME/bin
  3. Install GIT:
    • sudo yum install git
  4. Set Environment Variables:
    • sudo vi ~/.bashrc
      export SPARK_HOME=/usr/iop/4.1.0.0/spark
      export SBT_HOME=/usr/local/sbt
      export HADOOP_CONF_DIR=/etc/hadoop/conf
      export EXTRA_CLASSPATH=/usr/iop/current/hadoop-client/client/hadoop-auth.jar:/usr/iop/current/hadoop-client/client/avro.jar:/usr/iop/current/hadoop-client/client/hadoop-annotations.jar:/usr/iop/current/hadoop-client/client/zookeeper.jar:/usr/iop/current/hadoop-client/client/leveldbjni-all.jar:/usr/iop/current/hadoop-client/client/hadoop-yarn-server-common.jar:/usr/iop/current/hadoop-client/client/hadoop-yarn-common.jar:/usr/iop/current/hadoop-client/client/hadoop-yarn-client.jar:/usr/iop/current/hadoop-client/client/hadoop-yarn-api.jar:/usr/iop/current/hadoop-client/client/hadoop-mapreduce-client-shuffle.jar:/usr/iop/current/hadoop-client/client/hadoop-mapreduce-client-jobclient.jar:/usr/iop/current/hadoop-client/client/hadoop-mapreduce-client-core.jar:/usr/iop/current/hadoop-client/client/hadoop-mapreduce-client-common.jar:/usr/iop/current/hadoop-client/client/hadoop-mapreduce-client-app.jar:/usr/iop/current/hadoop-client/client/hadoop-hdfs.jar:/usr/iop/current/hadoop-client/client/hadoop-common.jar

Step 2: Get & Build Spark Notebook Latest Source Code

  1. Get Spark Notebook Source Code:
    • sudo git clone — https://github.com/andypetrella/spark-notebook.git
    • cd spark-notebook
  2. Compile and create distributed binary Spark Notebook:
    • sudo sbt console
    • sudo sbt -Dspark.version=1.4.1 -Dhadoop.version=2.7.1 -Djets3t.version=0.7.1 -Dwith.hive=true -Dwith.parquet=true clean dist
  3. Distribute binary:
    • Once compilation and built completes. The binary zip file is located under “./target/universal/spark-notebook-0.6.2-SNAPSHOT-scala-2.10.4-spark-1.4.1-hadoop-2.7.1-with-hive-with-parquet.zip”
    • To use the binary on a client machine, do:
      unzip spark-notebook-0.6.2-SNAPSHOT-scala-2.10.4-spark-1.4.1-hadoop-2.7.1-with-hive-with-parquet.zip -d /usr/local;sudo ln -s /usr/local/spark-notebook-0.6.2-SNAPSHOT-scala-2.10.4-spark-1.4.1-hadoop-2.7.1-with-hive-with-parquet /usr/local/notebook;sudo chown -R spark:hadoop /usr/local/*notebook*

Step 3: Modify Configuration for tuning

  1. su – spark
  2. vi /usr/local/spark-notebook/conf/application.conf

Step 4: Start Notebook

  1. Start Notebook as user “spark”:
    • su – spark
    • cd /usr/local/spark-notebook
    • ./bin/spark-notebook -Dconfig.file=./conf/application.conf
    • Start a browser with URL: http://hostname:9000
spark-notebook.io
Spark Notebook

2 comments on"Setup Spark Notebook (spark-notebook.io) with BigInsights v4.x"

Join The Discussion

Your email address will not be published. Required fields are marked *