Introduction
In a previous post, it demonstrated how to install and setup Jupyter notebook on IBM Open Platform (IOP) Cluster. Here is link to the post. In this recipe, it concentrates on install and setup Jupyter Notebook on Hortonwork Data Platform (HDP).

Jupyter Notebook is a web application that allows creating and sharing documents that contain live code, equations, visualizations and explanatory text.

A notebook is interactive, so you can executive code directly from a web browser. Jupyter supports multiple kernels with different programming languages.

Objective

This technical recipe is intended to show viewers how to install and setup Jupyter Notebook with Python kernel on HDP cluster.

Version Tested

    • HDP v2.6.2
    • Python v2.7x
    • Apache Spark v1.6 and/or above
    • RHEL v7.2, CentOS v7.2

Steps

  1. Prepare Pre-requisites
    • Extra Packages for Enterprise Linux (or EPEL) is required for installing Data Science related Python libraries. Here is an example on how to enable EPEL. Note: Swap to the appropriate rpm file to match with your OS version. $ wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm; $ sudo rpm -ivh epel-release-latest-7.noarch.rpm
    • Once EPEL is enabled, prepare Python for next step. $ sudo yum upgrade python-setuptools
  2. Install Python package management system in order to install extra Python libraries. You can choose either Anaconda or pip. In this post, “pip” is chosen. However, reference commands for Anaconda is listed. But you need to do only one of the two.
    • Install pip sudo wget https://bootstrap.pypa.io/ez_setup.py -O - | python ;sudo yum install python-pip python-wheel python-devel gcc
    • OR Install Anaconda: (OPTIONAL)
        Download script match with your Python version. Example is in Python 2.7. $ sudo wget https://repo.continuum.io/archive/Anaconda2-5.0.0.1-Linux-x86_64.sh
        Install Anaconda: bash Anaconda2-5.0.0.1-Linux-x86_64.sh
  3. Install a few basic data science related Python library:$ pip install --upgrade pip wheel pandas numpy scipy scikit-learn matplotlib virtualenv
  4. Install Jupyter Notebook: $ pip install jupyter
  5. Setup Jupyter Notebook configuration file: $ jupyter notebook --generate-config; $sudo mkdir -p /ibm/conf; $ sudo chown -R spark:hadoop /ibm; $ cp ~/.jupyter/jupyter_notebook_config.py /ibm/conf/
  6. (Option) You can add optional parameters in Jupyter notebooks as default. Following are a few examples. Also consult the previous post related to setting up security.
      $ cat /ibm/conf/jupyter_notebook_config.py
      c.NotebookApp.notebook_dir = u’/ibm/notebook_repo’
      c.NotebookApp.ip = ‘*’
      c.NotebookApp.port = 8889
  7. (Option) You can create an auto start shell script to start Jupyter notebook when system boot up. Note: In the example script shows you how to make Jupyter recognize cluster Spark and Hadoop libraries so that you don’t need to construct Spark context object in every single notebook file.

    Here is an example:
    $ cat /ibm/scripts/start_jupyter.sh

    start jupyter
    Start Jupyter Process
  8. Usage: $ /ibm/script/start_jupyter.sh spark

Reference: jupyter.org

1 comment on"Setup Jupyter Notebook on Hortonworks Data Platform (HDP)"

  1. tianoklein@hotmail.com April 30, 2018

    / bin / bash set -x USER- $ 1 JUPYTER HOST- JUPYTER PORT-8889 export SPARK HOME-usr / hdp / current / spark-client rt PYSPARK SUBMIT ARGs-master yarn-client pyspark-shell export BADO0P HOME- / usr / hdp / current / hadoop-client rt TADOOP CONE DIR- / usr / hdp / current / hadoop-client / conf export PYTHONPATH / usr / hdp / current / spark-client / python: / us / dp / current / spark -config / export / boot-shell -master / pyspark-shel1 echo “Starting Jupyter daemon on HDP Cluster …- jupyter er notebook config- / ibm / conf / jupyter notebook config-py – ip- $ tJUPYTER ROSTpo exit

Join The Discussion

Your email address will not be published. Required fields are marked *