Introduction

Jupyter Notebook is a web application that allows creating and sharing documents that contain live code, equations, visualizations and explanatory text.

A notebook is interactive, so you can executive code directly from a web browser. Jupyter supports multiple kernels with different programming languages. Currently in Spark world, the most popular plug-in kernels are Python, Scala, and R.

Objective

This technical document is intended to show viewers how to install and setup Jupyter Notebook with Python, Scala and R kernels on BigInsights cluster.

Version Tested

    • BigInsights v4.1.0.1 & v4.1.0.2
    • Python v3+
    • Apache Spark v1.4 and v1.5.1
    • RHEL v6.6 & v7.x, CentOS v6.6 & v7.x

Steps
Note: Commands were triggered as Jupyter administrator if not started with sudo. In this post, user “spark” was used as the Jupyter administrator.

  1. Prepare Pre-requisites
    • If you have a Python version v2.x on the cluster nodes, you can install v3.x and also keep multiple Python versions instead of force v2 completely upgrade to v3. This is a preferred solution especially some application may dependent upon certain older Python versions. A simple example of compiling and install Python v3.4 in the following:
      $wget http://python.org/ftp/python/3.4.1/Python-3.4.1.tar.xz; tar xvf Python-3.4.1.tar.xz; cd Python-3.4.1; sudo ./configure --prefix=/usr/local --enable-shared LDFLAGS="-Wl,-rpath /usr/local/lib"; sudo make; sudo make altinstall
    • For RHEL/CentOS v6: $ sudo yum install nano zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel libpng-devel libjpg-devel atlas-devel
    • $ sudo yum groupinstall "Development Tools"
    • $ sudo yum install python-pip
  2. Install Jupyter application
    • sudo pip3.4 install --upgrade pip
    • sudo pip3.4 install --upgrade jupyter
    • sudo pip3.4 install "ipython[notebook]"
  3. Create a self-signed certificate for SSL login
    • $sudo mkdir /etc/jupyter; $sudo chown -R spark:hadoop /etc/jupyter (Note: Swap "spark" ID to your Jupyter administrator ID)
    • Example: $ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /etc/jupyter/linda1.key -out /etc/jupyter/lindacert.pem
  4. Create a Hash password for Web URL login
    • $python3.4
    • >>> from notebook.auth import passwd
    • >>> passwd()
      'sha1:e555237d8465:dc853c6e9f073d4b885f70c566456c451ae95b53'
  5. Configure Jupyter with SSL and password secured Web Site
    • $ jupyter notebook --generate-config (**This command generates a config file in the default location: “~/.jupyter/jupyter_notebook_config.py”)
    • $ cp ~/.jupyter/jupyter_notebook_config.py /etc/jupyter/
    • $ vi /etc/jupyter/jupyter_notebook_config.py
    • Add-in the security key and hash password with your corresponding values. Example in the screenshot below.

      Configure Jupyter Python Kernel
      Configure Jupyter Python Kernel
    • Modify the Python Kernel for ease of usage (Optional)
      kernel.json
      Python Kernel config file “kernel.json”
    • Start the Jupyter notebook: (Do not start by root, start the process as a designated Jupyter admin user id)
      $ jupyter notebook --config=/etc/jupyter/jupyter_notebook_config.py
      Notice: The URL is https and a forced login
      Jupyter Login
      SSL Password Secured Jupyter Web Login
      Jupyter Python Kernel
      Jupyter Python Kernel
  6. Add Scala Kernel:
    • Pre-req: Install sbt (Download @http://www.scala-sbt.org/download.html). Simply unzip the file and append to the $PATH environment variable. (ex. $ tar zxvf sbt-0.13.8.tgz -d /usr/local; export PATH=$PATH:/usr/local/sbt/bin)
    • $ git clone https://github.com/alexarchambault/jupyter-scala.git
    • $ cd jupyter-scala
    • $ sbt scala-cli/packArchive
    • $ ./jupyter-scala
  7. Add R Kernel:
    • $ conda install -c r r-essentials
    • $ conda create -n r-kernel -c r r-essentials
  8. Verify installed Kernels: $ jupyter kernelspec list
  9. Start Jupyter Notebook: $ jupyter notebook (Note: Default port is 8888. Use parameter – -port with two dashes to change the port. Ex: $ jupyter notebook --port 8089)
Spark Notebook
Jupyter Notebook

Reference: jupyter.org

3 comments on"Setup Spark Notebook (Jupyter) with BigInsights v4.x"

  1. Linda, I am running into issues while generating jupyter config….

    /root/Python-3.4.1> jupyter notebook –generate-config
    Traceback (most recent call last):
    File “/usr/local/lib/python3.4/site-packages/notebook/services/sessions/sessionmanager.py”, line 9, in
    import sqlite3
    File “/usr/local/lib/python3.4/sqlite3/__init__.py”, line 23, in
    from sqlite3.dbapi2 import *
    File “/usr/local/lib/python3.4/sqlite3/dbapi2.py”, line 26, in
    from _sqlite3 import *
    ImportError: No module named ‘_sqlite3’

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
    File “/usr/local/bin/jupyter-notebook”, line 7, in
    from notebook.notebookapp import main
    File “/usr/local/lib/python3.4/site-packages/notebook/notebookapp.py”, line 62, in
    from .services.sessions.sessionmanager import SessionManager
    File “/usr/local/lib/python3.4/site-packages/notebook/services/sessions/sessionmanager.py”, line 12, in
    from pysqlite2 import dbapi2 as sqlite3
    ImportError: No module named ‘pysqlite2’

    • Frank Ketelaars September 09, 2016

      Hi Shree,
      I ran into the same issue. The steps are slightly out of sequence.

      You first need to install the RHEL packages (yum install) before running “sudo ./configure –prefix=/usr/local –enable-shared LDFLAGS=”-Wl,-rpath /usr/local/lib”; sudo make; sudo make altinstall”.

      After that, just resume with the remainder of the procedure and it works.
      Regards, Frank

  2. Hasan Poonawala September 21, 2016

    Hi Linda,
    Thanks for the steps. I installed Jupyter Python kernel as per your steps and also Scala kernel. Both were successful. However, on launch “jupyter notebook” , I see the message “0 active kernels”. Any assistance on how I can debug this ?
    Also, the python config is generated at /etc/jupyter/jupyter_notebook_config.py and there is no scala config file defined.

    jupyter-kernelspec list
    Available kernels:
    python3 /usr/local/lib/python3.4/site-packages/ipykernel/resources
    scala211 /root/.local/share/jupyter/kernels/scala211

    Thanks,
    Hasan

Join The Discussion

Your email address will not be published. Required fields are marked *