Serve Watson NLP models using Knative Serving

Archived content

Archive date: 2024-05-29

This content is no longer being updated or maintained. The content is provided “as is.” Given the rapid evolution of technology, some content, steps, or illustrations may have changed.

With IBM Watson NLP, IBM introduced a common library for natural language processing, document understanding, translation, and trust. IBM Watson NLP brings everything under one umbrella for consistency and ease of development and deployment. This tutorial walks you through the steps to serve pretrained Watson NLP models using Knative Serving in a Red Hat OpenShift cluster.

Knative Serving is an open source, enterprise-level solution to build serverless and event-driven applications in Kubernetes and Red Hat OpenShift clusters. It supports horizontal autoscaling based on the requests that come into a service, allowing the service to scale down to zero replicas. For more information, see the Knative documentation.

In this tutorial, you create a Knative service to run the Watson NLP runtime. Pods of this Knative service specify Watson NLP pretrained model images as init containers. These init containers run to completion before the main application starts in the pod. They provision models to the emptyDir volume of the pod. When the Watson NLP runtime container starts, it loads the models and begins serving them.

This approach allows for models to be kept in separate container images from the runtime container image. To change the set of served models, you need only update the Knative service manifest.

Reference architecture

Reference architecture for Knative deployment pattern

Prerequisites

To run this tutorial, you must:

Install Docker Desktop.
Ensure that you have access to a Red Hat OpenShift container platform account with cluster administrator access.
- For this tutorial, IBM personnel and Business Partners can reserve a sandbox environment. When you reserve the environment, a project is created for you in a Red Hat OpenShift cluster. You recieve an email with instructions on accessing the environment.
- Alternatively, if you are using your own cluster, use the following instructions to install Knative Serving.
  - Install the Red Hat OpenShift Serverless Operator
  - Install Knative Serving
Install the Red Hat OpenShift CLI (oc), and log in to the Red Hat OpenShift cluster.
Create a Docker registry secret in the Kubernetes project that grants access to the Watson NLP Runtime and pretrained models.

Tip: Podman provides a Docker-compatible command-line front end. Unless otherwise noted, all of the Docker commands in this tutorial should work for Podman if you simply alias the Docker CLI with the alias docker=podman shell command.

Steps

Step 1. Configure Knative

Note: Skip this step if you are using the sandbox environment.

The deployment approach that we use in this tutorial relies on capabilities of Knative Serving that are disabled by default. You'll configure Knative Service to enable init containers and empty directories.

To apply the configuration, use the following command.

kubectl apply -f - <<EOF
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving
spec:
  config:
    features:
      kubernetes.podspec-init-containers: "enabled"
      kubernetes.podspec-volumes-emptydir: "enabled"
EOF

Step 2. Clone the GitHub repository

Clone the repository that contains the code that is used in this tutorial.

git clone https://github.com/ibm-build-labs/Watson-NLP
cd Watson-NLP/MLOps/Watson-NLP-Knative/deployment

Step 3. Deploy the model service

In this step, you create a Knative Service to run the Watson NLP runtime. When a service is created, Knative does the following:

It creates a new immutable revision for this version of the application.
It creates a route, ingress, service, and load balancer for your application.
It automatically scales replicas based on request load, including scaling to zero active replicas.

Create the service by running the following command.
```
oc apply -f knative-service.yaml
```
Verify that the service has been created.
```
oc get configuration
```

You should see output similar to the following.

NAME            LATESTCREATED         LATESTREADY           READY   REASON
watson-nlp-kn   watson-nlp-kn-00001   watson-nlp-kn-00001   True

Use the following command to check the revisions of this service.
```
oc get revisions
```

Set the URL for the service in an environment variable.

export SERVICE_URL=$(oc get ksvc watson-nlp-kn  -o jsonpath="{.status.url}")

Step 4. Test Knative autoscaling

With the parameters used when creating the service, Knative autoscales pods based on requests, including scaling to zero when there are no requests.

Run the following command to list the pods in your Red Hat OpenShift project.
```
oc get pods
```
Pods belonging to the Knative service should have the prefix watson-nlp-kn. Initially, there should be none. If you do see some, then wait for a few minutes and they are automatically terminated.
Run following command to trigger the Knative service to start pods.
```
curl ${SERVICE_URL}
```
Use ctrl-c to break out of the command.

Use the following command to see the pods being created to in response to the request, and then later terminated.

oc get pods -w

The output is similar to the following.

NAME                                              READY   STATUS     RESTARTS   AGE
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   0/2     Init:0/1   0          15s
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   0/2     PodInitializing   0          75s
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   1/2     Running           0          76s
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   2/2     Running           0          2m
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   2/2     Terminating       0          3m
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   1/2     Terminating       0          3m20s
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   1/2     Terminating       0          3m30s
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   0/2     Terminating       0          3m32s

Use ctrl-c to break out of the command.

Step 5. Test the service

In this step, you make an inference request on the model using the REST interface. Run the following command.

curl -X POST "${SERVICE_URL}/v1/watson.runtime.nlp.v1/NlpService/ClassificationPredict" -H "accept: application/json" -H "grpc-metadata-mm-model-id: classification_ensemble-workflow_lang_en_tone-stock" -H "content-type: application/json" -d "{ \"rawDocument\": { \"text\": \"Watson nlp is awesome! works in knative\" }}" | jq

You see output similar to the following example.

{
  "classes": [
    {
      "className": "satisfied",
      "confidence": 0.6308287
    },
    {
      "className": "excited",
      "confidence": 0.5176963
    },
    {
      "className": "polite",
      "confidence": 0.3245624
    },
    {
      "className": "sympathetic",
      "confidence": 0.1331128
    },
    {
      "className": "sad",
      "confidence": 0.023583649
    },
    {
      "className": "frustrated",
      "confidence": 0.0158445
    },
    {
      "className": "impolite",
      "confidence": 0.0021891927
    }
  ],
  "producerId": {
    "name": "Voting based Ensemble",
    "version": "0.0.1"
  }
}

Summary

In this tutorial, you deployed a pretrained Watson NLP Model on a Red Hat OpenShift cluster using a Knative service. Model images are specified as init containers in the Kubernetes manifest. You further observed Knative autoscaling, including scaling to zero.

Take a look at more embeddable AI content on IBM Developer, or try out the IBM Natural Language Processing Library for Embed.