About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Tutorial
Serve Watson NLP models using Knative Serving
Learn how to serve pretrained Watson NLP models using Knative Serving in a Red Hat OpenShift cluster
On this page
Archived content
Archive date: 2024-05-29
This content is no longer being updated or maintained. The content is provided “as is.” Given the rapid evolution of technology, some content, steps, or illustrations may have changed.With IBM Watson NLP, IBM introduced a common library for natural language processing, document understanding, translation, and trust. IBM Watson NLP brings everything under one umbrella for consistency and ease of development and deployment. This tutorial walks you through the steps to serve pretrained Watson NLP models using Knative Serving in a Red Hat OpenShift cluster.
Knative Serving is an open source, enterprise-level solution to build serverless and event-driven applications in Kubernetes and Red Hat OpenShift clusters. It supports horizontal autoscaling based on the requests that come into a service, allowing the service to scale down to zero replicas. For more information, see the Knative documentation.
In this tutorial, you create a Knative service to run the Watson NLP runtime. Pods of this Knative service specify Watson NLP pretrained model images as init
containers. These init
containers run to completion before the main application starts in the pod. They provision models to the emptyDir
volume of the pod. When the Watson NLP runtime container starts, it loads the models and begins serving them.
This approach allows for models to be kept in separate container images from the runtime container image. To change the set of served models, you need only update the Knative service manifest.
Reference architecture
Prerequisites
To run this tutorial, you must:
- Install Docker Desktop.
- Ensure that you have access to a Red Hat OpenShift container platform account with cluster administrator access.
- For this tutorial, IBM personnel and Business Partners can reserve a sandbox environment. When you reserve the environment, a project is created for you in a Red Hat OpenShift cluster. You recieve an email with instructions on accessing the environment.
- Alternatively, if you are using your own cluster, use the following instructions to install Knative Serving.
- Install the Red Hat OpenShift CLI (
oc
), and log in to the Red Hat OpenShift cluster. - Create a Docker registry secret in the Kubernetes project that grants access to the Watson NLP Runtime and pretrained models.
Tip: Podman provides a Docker-compatible command-line front end. Unless otherwise noted, all of the Docker commands in this tutorial should work for Podman if you simply alias the Docker CLI with the alias docker=podman
shell command.
Steps
Step 1. Configure Knative
Note: Skip this step if you are using the sandbox environment.
The deployment approach that we use in this tutorial relies on capabilities of Knative Serving that are disabled by default. You'll configure Knative Service to enable init
containers and empty directories.
To apply the configuration, use the following command.
Step 2. Clone the GitHub repository
Clone the repository that contains the code that is used in this tutorial.
Step 3. Deploy the model service
In this step, you create a Knative Service to run the Watson NLP runtime. When a service is created, Knative does the following:
- It creates a new immutable revision for this version of the application.
- It creates a route, ingress, service, and load balancer for your application.
- It automatically scales replicas based on request load, including scaling to zero active replicas.
Create the service by running the following command.
Verify that the service has been created.
You should see output similar to the following.
Use the following command to check the revisions of this service.
Set the URL for the service in an environment variable.
Step 4. Test Knative autoscaling
With the parameters used when creating the service, Knative autoscales pods based on requests, including scaling to zero when there are no requests.
Run the following command to list the pods in your Red Hat OpenShift project.
Pods belonging to the Knative service should have the prefix
watson-nlp-kn
. Initially, there should be none. If you do see some, then wait for a few minutes and they are automatically terminated.Run following command to trigger the Knative service to start pods.
Use
ctrl-c
to break out of the command.Use the following command to see the pods being created to in response to the request, and then later terminated.
The output is similar to the following.
Use
ctrl-c
to break out of the command.
Step 5. Test the service
In this step, you make an inference request on the model using the REST interface. Run the following command.
You see output similar to the following example.
Summary
In this tutorial, you deployed a pretrained Watson NLP Model on a Red Hat OpenShift cluster using a Knative service. Model images are specified as init
containers in the Kubernetes manifest. You further observed Knative autoscaling, including scaling to zero.
Take a look at more embeddable AI content on IBM Developer, or try out the IBM Natural Language Processing Library for Embed.