Installing Jupyter Notebook for Spark

1.1 How to install Python Kernel for Jupyter

1.2 How to install Scala Kernel for Jupyter

1.3 How to install R Kernel for Jupyter

1.4 How to change ports and configure the IP for accessing Spark Notebook

1.5 How to set password for web authentication

1.6 How to make SSL-enabled for URL

1.7 How to connect Jupyter to Spark

Note: The instructions in italics are the commands you need to run

1.1 How to install Python Kernel for Jupyter:

1. sudo yum install nano zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel
gdbm-devel db4-devel libpcap-devel xz-devel libpng-devel libjpg-devel atlas-devel

2. sudo yum groupinstall “Development Tools” (be careful with double quotes here)

3. Download Anaconda 2-2.5.0-Linux-x86_64.sh from https://www.continuum.io/downloads For Python 2.7

4. Copy the Anaconda file into the node where Jupyter must be installed (where Jupyter needs to be running)

5. bash Anaconda2-2.4.1-Linux-x86_64.sh

6. source ~/.bashrc

7. conda install jupyter

8. Start Jupyter notebook with
jupyter-notebook –ip=hdtest100.svl.ibm.com
(The ip will be the node where jupyter is to be running, in this case it is hdtest100.svl.ibm.com, for local, it will be localhost)

9. Check the kernels installed by running jupyter kernelspec list

10. Go to your browser and connect to Jupyter using this link:
http://hdtest100.svl.ibm.com:8888/
(You must point to your hostname and by default 8888 is used. You can change this later)

1.2 How to install Scala Kernel for Jupyter:

1. Install GIT using yum install git

2. git clone https://github.com/alexarchambault/jupyter-scala.git

3. Install sbt build tool by running sudo yum install sbt

4. cd jupyter-scala

5. sbt cli/packArchive

6. To launch Scala shell ./jupyter-scala

7. Check the kernels installed by running jupyter kernelspec list

8. Launch the Jupyter notebook
jupyter-notebook –ip=hdtest100.svl.ibm.com

9. Go to your browser and connect to Jupyter using this link:
http://hdtest100.svl.ibm.com:8888/

10. You can now choose between Python 2 and Scala 2.11 shells.

1.3 How to install R Kernel for Jupyter:

1. conda install -c r r-essentials

2. conda create -n r-kernel -c r r-essentials

3. source activate r-kernel

4. Check the kernels installed by running jupyter kernelspec list

1.4 How to change ports and configure the IP for accessing Spark Notebook:

1. Check to see if you have a notebook configuration file, jupyter_notebook_config.py. The default location for
this file is ~/.jupyter

2. If you don’t already have one, create a config file for by jupyter notebook –generate-config

3. Edit the jupyter_notebook_config.py file

4. To change the ip and port, you can make changes to
# The IP address the notebook server will listen on.
c.NotebookApp.ip = ‘hdtest100.svl.ibm.com’
# The port the notebook server will listen on.
c.NotebookApp.port = 8888

5. Once this ip is configured, you can launch Jupyter notebook with
jupyter-notebook

1.5 How to set password for web authentication:

1. Type in a python/IPython shell:
from notebook.auth import passwd; passwd()

This will prompt for password. Enter your password and confirm by typing it again. You will get hashed password for this:
For ex. I set password as password
This generates ‘sha1:3ceecb69dbcf:e34fee911737f251a1d6674c447319bc345515eb’

2. Set this password in the jupyter_notebook_config.py for password parameter as
#c.NotebookApp.password = u’sha1:3ceecb69dbcf:e34fee911737f251a1d6674c447319bc345515eb’

1.6 How to make SSL-enabled for URL :

1. A self-signed certificate can be generated with openssl.
$ openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mykey.key -out mycert.pem
This certificate is valid for 365 days with both the key and certificate data written to the same file.

2. Edit the jupyter_notebook_config.py file in ~/.jupyter
c.NotebookApp.certfile = u’/absolute/path/to/your/certificate/mycert.pem’
c.NotebookApp.keyfile = u’/absolute/path/to/your/certificate/mykey.key’

3. Now you can access Jupyter with this url: https://hdtest100.svl.ibm.com:8888/

1.7 How to connect Jupyter to Spark:

PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS=”notebook” $SPARK_HOME/bin/pyspark –master local[*]

Where $SPARK_HOME: environment variable set to the Spark home directory

9 comments on"How to install Jupyter Notebook for Spark"

  1. Thank you!

  2. Thanks tremendously for this (installing scala kernel). Worked first time

  3. ChekadSarami January 15, 2017

    Hi,

    Thank you for your tutorial. I cannot get Jupyter to work with pyspark. I am a newbie to all these.

    Where should I put your code in 1.7? Should I put it in pyspark file inside $SPARK_HOME/bin ?

    PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS=”notebook” $SPARK_HOME/bin/pyspark –master local[*]

    When I run it in my terminal I get the exception error:

    Exception in thread “main” java.lang.IllegalArgumentException: pyspark does not support any application options.
    at org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:242)
    at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildPySparkShellCommand(SparkSubmitCommandBuilder.java:241)
    at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:117)
    at org.apache.spark.launcher.Main.main(Main.java:86)

    -The last three lines of my pyspark file look like this:

    export PYSPARK_DRIVER_PYTHON
    export PYSPARK_DRIVER_PYTHON_OPTS
    exec “${SPARK_HOME}”/bin/spark-submit pyspark-shell-main –name “PySparkShell” “$@”

    Any help would be greatly appreciated.

    Thanks

    CS

    • Hi,

      In my case, I have spark installed under /usr/iop/4.2.0.0/spark/.
      $SPARK_HOME = /usr/iop/4.2.0.0/spark/
      So, to start PySpark, I run below command from my Putty to launch Spark Shell

      PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS=”notebook” $SPARK_HOME/bin/pyspark –master local[*]

      You can email me on mail-id if you have more questions.

      • Morten Espelid April 17, 2017

        It worked for me when putting jupyter in quotations as well:
        PYSPARK_DRIVER_PYTHON=”jupyter” PYSPARK_DRIVER_PYTHON_OPTS=”notebook” pyspark

  4. Hi i am getting the same error:
    Exception in thread “main” java.lang.IllegalArgumentException: pyspark does not support any application options.
    at org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:242)
    at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildPySparkShellCommand(SparkSubmitCommandBuilder.java:241)
    at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:117)
    at org.apache.spark.launcher.Main.main(Main.java:86)

    Where do i put the code? in .bashrc?
    i just want to connect pyspark to jupyter. both are running indivisually.

  5. Vijayakumar August 01, 2019

    When i try to “install Scala Kernel for Jupyter” i am not able to run the command(sbt cli/packArchive) that is given above.

    I have pasted my error below. it says that ‘cli’ is not a valid command. can anyone help me on this?

    [root@vijay1 jupyter-scala]# sbt cli/pack
    [info] Loading settings for project jupyter-scala-build-build from plugins.sbt …
    [info] Loading project definition from /root/jupyter-scala/project/project
    [info] Loading settings for project jupyter-scala-build from plugins.sbt …
    [info] Loading project definition from /root/jupyter-scala/project
    [info] Loading settings for project almond from build.sbt …
    [info] Set current project to almond (in build file:/root/jupyter-scala/)
    [error] Expected ID character
    [error] Not a valid command: cli (similar: client, alias, plugin)
    [error] Expected project ID
    [error] Expected configuration
    [error] Expected ‘:’
    [error] Expected key
    [error] Not a valid key: cli (similar: clean, doc)
    [error] cli/pack
    [error] ^

Join The Discussion

Your email address will not be published. Required fields are marked *