In this tutorial, learn how to find data sets on the Red Hat Marketplace, then set up Jupyter Notebooks on Red Hat OpenShift. In this example, you use the IBM Debator Sentiment Composition Lexicons data set.
The following image shows the OpenShift architecture diagram. The data set is stored in a Persistent Volume Claim (PVC), and the Jupyter Notebook image is in a pod on OpenShift. Notice how the PVC and pod are not connected. In this tutorial, you learn how to mount the PVC to the pod to access the data.
The following definitions help you understand the diagram better.
- Pod: Like a machine instance to a container. The Jupyter Notebook image is contained in a pod.
- Deployment configuration: Acts like a pod template and describes how the pod should be deployed. Essentially, it configures how to start the Jupyter Notebook image.
- Persistent Volume Claim (PVC): A storage container, and will contain the data set that you download from Red Hat Marketplace. The PVC requests PV resources without having specific knowledge of the underlying storage infrastructure, so essentially it is claiming storage space from the persistent volume.
To follow this tutorial, you need:
- A Red Hat Marketplace account.
- An OpenShift cluster.
The OpenShift CLI. Follow the Configure your OpenShift cluster with Red Hat Marketplace tutorial to complete the prerequisites.
At the end of step 1, make sure to add the
ocbinary file to your
PATH. For example, for Mac users:
mv /<filepath>/oc /usr/local/bin/oc
If additional help is needed to set up the OpenShift CLI, look at this documentation.
- You can skip step 2.
- In step 3, name your project whatever you like.
The Helm package manager, which is needed to mount the data set to OpenShift.
It should take you approximately 45 minutes to complete this tutorial.
The prerequisite tutorial explained the OpenShift web console and the OpenShift CLI. You’ll use both methods to access your OpenShift cluster, so keep the following differences in mind:
- OpenShift command line interface (CLI): This is accessed with the
occommand on your terminal.
OpenShift web console: This is accessed on a web browser. To access the console:
- On the upper-left corner of IBM Cloud, click the hamburger icon (navigation menu), then click OpenShift.
- Click on your cluster, then go to the Overview page of the cluster. It should look similar to the following image after it loads.
Click OpenShift web console.
You are directed to the Red Hat OpenShift web console.
Step 1. Download the data set from Red Hat Marketplace
- Go to the Red Hat Marketplace and log in.
- Search for and click IBM Debator Sentiment Composition Lexicons.
Select Get it free, then choose OpenShift as the download location. You should see something similar to the following image.
Recall that you already have the OpenShift CLI and an OpenShift cluster. You should already be logged in to your OpenShift CLI. If not, use something like the following command (see step 4 in the prerequisite tutorial):
oc login --token=<TOKEN> --server=<URL
Switch to the project you created in step 3 of the prerequisite tutorial.
oc project <project_name>
Follow the Steps to mount a storage object. When you “Mount to OpenShift” and are asked to choose a namespace, use the project you just switched to. You can skip step 4: Connect to application.
If everything worked, you should see:
You have mounted the data set to OpenShift, and it will be stored in a PVC, which is storage on OpenShift. Save the PVC name that is returned because you will use it later. It should be similar to the following name.
Step 2. Create Jupyter Notebook image
In the OpenShift CLI, make sure that you are in the correct project.
oc project <project_name>
To run a Jupyter Notebook with OpenShift, you must build a template image. In this tutorial, you use the Source-to-Image (S2I) build process to create a minimal Jupyter Notebook image. Using this S2I, you can create other Jupyter Notebooks. First, using the OpenShift CLI, create the minimal notebook.
oc create -f https://raw.githubusercontent.com/jupyter-on-openshift/jupyter-notebooks/master/build-configs/s2i-minimal-notebook.json
You can follow the progress of creating the notebook (this might take a few minutes).
oc logs --follow bc/s2i-minimal-notebook-py36
When complete, check that the minimal notebook was created.
oc describe imagestream s2i-minimal-notebook
Step 3. Create Jupyter Notebook template
Download a Jupyter Notebook template to more easily deploy notebooks. The template automatically sets deployment configurations and uses the s2i-minimal-notebook:3.6 image that you just created. Use the notebook-deployer template.
oc create -f https://raw.githubusercontent.com/jupyter-on-openshift/jupyter-notebooks/master/templates/notebook-deployer.json
If you want all templates, use the following commands.
oc create -f https://raw.githubusercontent.com/jupyter-on-openshift/jupyter-notebooks/master/templates/notebook-deployer.json oc create -f https://raw.githubusercontent.com/jupyter-on-openshift/jupyter-notebooks/master/templates/notebook-builder.json oc create -f https://raw.githubusercontent.com/jupyter-on-openshift/jupyter-notebooks/master/templates/notebook-quickstart.json oc create -f https://raw.githubusercontent.com/jupyter-on-openshift/jupyter-notebooks/master/templates/notebook-workspace.json
On the OpenShift web console, refresh the page. Make sure that you are in the Developer role by checking at the upper left. Click +Add, then click From Catalog, clear the filters, and search for Jupyter Notebook.
Create the Jupyter Notebook by selecting the deployer notebook.
Instantiate the template. Everything should be the default. In this example, you use
dax-sentiment-notebookas the APPLICATION_NAME. The NOTEBOOK_PASSWORD is used to access the Jupyter Notebook.
After clicking Create, a new pod and deployment configuration should be created (DC stands for deployment configuration). Also, if you click the circle, you see that 1 pod (dax-sentiment-notebook-[ID]) is running.
(OPTIONAL) Test that you can now launch the Jupyter Notebook. After your pod is created (it might take a few minutes), you can check whether the Jupyter Notebook instance was launched correctly. In the OpenShift web console, select Developer in the drop-down menu at the upper left. Select Topology, and click the dax-sentiment-notebook that you just created.
Under Routes, click the location link to launch the Jupyter Notebook. A new tab opens. You might need to enter the NOTEBOOK_PASSWORD that you set previously.
The Notebook list should be empty.
Step 4. Connect PVC to pod
Remember the OpenShift architecture diagram at the beginning of this tutorial where the PVC and the pod are not connected? In other words, you set up the Jupyter Notebook (pod), but you cannot access any of the data (PVC) from it. This step allows the pod to access the PVC.
The deployment configuration object for the pod is named
dax-sentiment-notebook, and the data is stored in the PVC named
rhm-dl-rhmccp-4e7ceec1-7a48-492c-9639-7ffb2d4f6f6e-pvc(from step 1). Run the following command to connect the pod and PVC. You might have to update the
claim-namewith your PVC name:
oc set volume dc/dax-sentiment-notebook --add -t='persistentVolumeClaim' --mount-path=/data --claim-name=rhm-dl-rhmccp-4e7ceec1-7a48-492c-9639-7ffb2d4f6f6e-pvc
To understand the command in more detail:
oc set volume: The object you are adding a volume to is the
--add -t: To this object, you are adding a
--mount-path: The location you are mounting the PVC to is
/data. You can name this anything, but I use
--claim-name: The name of the PVC you are mounting is
For more information about these commands see the OpenShift documentation.
Verify that the PVC is mounted to the pod. From the output, you can can confirm that in your pod the data is stored in the directory
/data. The command:
oc set volume dc --all
dax-sentiment-notebook pvc/rhm-dl-rhmccp-4e7ceec1-7a48-492c-9639-7ffb2d4f6f6e-pvc (allocated 8GiB) as volume-kv6dh mounted at /data
Step 5. Putting it all together
In the OpenShift web console, click Pod dax-sentiment-notebook. Under Pods, click dax-sentiment-notebook-[ID].
Select the Terminal tab, and type
ls /data. You should see the sentiment-composition-lexicons.tar file listed.
Untar sentiment-composition-lexicons.tar to the following directory.
tar -xvf /data/sentiment-composition-lexicons.tar
If you look at the directory now, there will be several files. The command:
ADJECTIVES.xlsx LEXICON_UG.txt ReleaseNotes.txt LEXICON_BG.txt LICENSE SEMANTIC_CLASSES.xlsx
Launch the dax-sentiment-notebook on the OpenShift web console.
You should see the all of the files. Now, you can create a new notebook and start using the data set.
This tutorial explained how to launch an OpenShift cluster from IBM Cloud, how to download a data set from Red Hat Marketplace, and mount it to your OpenShift cluster as a PVC. Additionally, you learned how to create a Jupyter Notebook image pod on OpenShift and how to connect it to the PVC.