This is an update post of: https://developer.ibm.com/hadoop/2015/08/29/setup-spark-notebook-zeppelin-biginsights

Introduction

Apache Zeppelin is a web-based notebook that enables interactive data ingestion, data discovery, data analytics, visualization and collaboration.

Objective

This technical document is intended to show viewers how to install and setup Zeppelin on a BigInsights cluster.

Version Tested

  • BigInsights v4.1.0.2
  • Apache Zeppelin v0.6.0
  • Apache Spark v1.5.1
  • Redhat v7.x, CentOS v7.x

Step 1: Compile and create a Zeppelin binary file

  1. sudo yum install npm git
  2. Download and install Apache Maven Example: sudo wget http://mirrors.ibiblio.org/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz; sudo tar zxvf apache-maven-3.3.9-bin.tar.gz -C /usr/local/;export PATH=$PATH:/usr/local/apache-maven-3.3.9/bin
  3. sudo git config --global url."https://".insteadOf git:// (Build will fail if missing this step! Notice two dashes before word "global")
  4. Download latest Zeppelin “Source” code. Example: sudo git clone https://github.com/apache/incubator-zeppelin.git
  5. Compile and create a package for deployment. sudo mvn clean package -Pbuild-distr -DskipTests -Dspark.version=1.5.1 -Dhadoop.version=2.7.1
  6. A binary file is created at "./incubator-zeppelin/zeppelin-distribution/target/zeppelin-0.6.0-incubating-SNAPSHOT.tar.gz". You can deploy this file in another BigInsights v4.1.0.2 Cluster. A copy is available to download here.
  7. Example on deploying to a new BigInsights v4.1.0.2 cluster: sudo tar zxvf zeppelin-0.6.0-incubating.tar.gz -C /usr/local/

Step 2: Configure Zeppelin

  1. Copy and modify the following 3 files under subdirectory “conf” in the deployed binary location. Example below using variable $SPARK_NOTEBOOK as the deployed location. (Ex. export SPARK_NOTEBOOK=/usr/local/zeppelin-0.6.0-incubating)
  2. cp $SPARK_NOTEBOOK/conf/zeppelin-env.sh.template $SPARK_NOTEBOOK/conf/zeppelin-env.sh
  3. cp $SPARK_NOTEBOOK/conf/zeppelin-site.sh.template $SPARK_NOTEBOOK/conf/zeppelin-site.sh
  4. cp $SPARK_NOTEBOOK/conf/log4j.properties.template $SPARK_NOTEBOOK/conf/log4j.properties
  5. Add the following 3 lines in $SPARK_NOTEBOOK/conf/zeppelin-env.sh
    export SPARK_HOME=/usr/iop/current/spark-client
    export HADOOP_HOME=/usr/iop/current/hadoop-client
    export HADOOP_CONF_DIR=/usr/iop/current/hadoop-client/conf

  6. Modify file SPARK_NOTEBOOK/conf/zeppelin-site.sh
    • Change port. FYI. Default is 8080. Example:

      zeppelin-site.xml snippet
      zeppelin-site.xml snippet
    • Spark interpreter (ie. Scala) is the default by install. If you prefer Python or any other interpreter as default. Modify the variable “zeppelin.interpreters” value in config file “zeppelin_site.xml”. Put your prefer default interpreter in the first value. Screenshot shows an example using Python as the default interpreter.
      Zeppelin interpreter
      Change Zeppelin default interpreter
  7. Start the Notebook process: SPARK_NOTEBOOK/bin/zeppelin-daemon.sh start

Step 3: Testing Zeppelin by reading a HDFS file

  • FYI. Example file “/data/ex1/UserPurchaseHistory.csv” is in HDFS and NOT POSIX
Zeppelin
zeppelin example: Read a HDFS file in Spark SQL
Zeppelin read HDFS file

 

2 comments on"Setup Spark Zeppelin Notebook with BigInsights v4.x"

  1. Thank you for this tutorial Linda. It is very helpful! I used this for IOP v4.2 and it works. However, the default interpreter is the markdown interpreter, even though the SparkInterpreter is the first item in the zeppelin-site.xml file under the zeppelin.interpreter property. Not sure why that is, but I was able to change the default right on the notebook interpreter’s page — so that works.

    • IOP v4.2
      Apache Zeppelin v0.7.0-SNAPSHOT
      Apache Spark v1.6.1
      Redhat v7.2

Join The Discussion

Your email address will not be published. Required fields are marked *