Tutorial
Serve Watson NLP models on a Google Kubernetes Engine cluster with Knative
Learn how to install Knative from scratch on a GKE cluster, and serve pretrained Watson NLP models
On this page
Archive date: 2024-05-30
This content is no longer being updated or maintained. The content is provided “as is.” Given the rapid evolution of technology, some content, steps, or illustrations may have changed.With IBM Watson NLP, IBM introduced a common library for natural language processing, document understanding, translation, and trust. IBM Watson NLP brings everything under one umbrella for consistency and ease of development and deployment. This tutorial walks you through the steps to serve pretrained Watson NLP models with Knative deployed on a Google Kubernetes Engine(GKE) cluster.
Knative is an open source platform that is built on top of Kubernetes that is designed to simplify the development and deployment of modern cloud-native applications by providing powerful tools for deploying, managing, and scaling serverless and event-driven workloads. Knative's autoscaling capabilities enable it to automatically scale serverless workloads up and down, even to zero, based on incoming traffic, which helps to reduce costs and improve application performance.
This tutorial explains how to install Knative from scratch on a GKE cluster, and serve pretrained Watson NLP models through a few Knative Services that are created for the Watson NLP runtime.
Prerequisites
To follow this tutorial, you need:
- Your entitlement key to access the IBM Entitled Registry
- Access to a project in Google Cloud
- If you don't use Cloud Shell, make sure that you have the following tools installed on your local machine:
Tip: Podman provides a Docker-compatible command-line front end. Unless otherwise noted, all of the Docker commands in this tutorial should work for Podman if you simply alias the Docker CLI with the alias docker=podman
shell command.
Deploy a GKE standard cluster
In this tutorial, you first deploy a GKE standard cluster. You could use an existing cluster if you have cluster-admin
access and enough resources in the cluster.
Step 1. Launch Google Cloud Shell
Cloud Shell is an interactive shell environment for Google Cloud that lets you learn and experiment with Google Cloud and manage your projects and resources from your web browser. With Cloud Shell, the Google Cloud CLI and other utilities that you need are preinstalled, fully authenticated, up-to-date, and always available when you need them. You can also install additional packages and tools that you need into the Cloud Shell environment.
If you have access to multiple Google Cloud projects, make sure the GOOGLE_CLOUD_PROJECT
environment variable in Cloud Shell is set to the correct project. You can use the gcloud config set project
command to switch projects, if necessary.
Step 2. Create the cluster with gCloud CLI
Set the variables for the region and cluster name of your choice.
Run the following command to create a standard cluster.
It takes a few minutes for the cluster creation to complete. When it's done, you can check it with the gcloud container clusters list
command. The gcloud container clusters create
command also updates the kubeconfig file (set by the $KUBECONFIG
environment variable or $HOME/.kube/config
by default) with the appropriate credentials and endpoint information to point kubectl
to the newly created cluster.
Install Knative using the Knative Operator
Knative provides a Kubernetes Operator to install, configure, and manage Knative. You can install the Serving component, Eventing component, or both on your cluster. In this tutorial, I only install Knative Serving.
Step 3. Install the Knative Operator
Run the following command to install the latest stable Operator release in the default
namespace.
Step 4. Verify your Knative Operator installation
To check the Operator deployment status, use the following command.
To track the log of the Operator, use the following command.
Step 5. Install Knative Serving
To install Knative Serving, you must create a custom resource (CR), add a networking layer to the CR, and configure DNS.
Create the Knative Serving CR.
NOTE: When you don't specify a version by using spec.version
, the Operator defaults to the latest version.
Step 6. Install the networking layer
The Knative Operator can configure the Knative Serving component with different networking layer options. Istio is the default networking layer if the ingress is not specified in the Knative Serving CR. If you choose the default Istio networking layer, you must install Istio on your cluster.
Download Istio 1.17.1 into a local directory, for example,
$HOME/istio
.Create an alias for
istioctl
in the current shell.Install Istio on your cluster.
Fetch the External IP of the Istio Ingress Gateway.
Step 7. Configure DNS
Knative uses DNS names to decide where to route the incoming traffic. To configure DNS for Knative, take the External IP of the ingress gateway that you get from the previous step, and create a wildcard A
record with your DNS provider. The following code shows an example.
Tip: If you don't have the required privileges to create this DNS record, you could use a record in the hosts file on your local machine or the -H "Host:"
command-line option of the curl
and grpcurl
commands to make REST and gRPC calls, as shown in the following example.
Step 8. Verify the Knative Serving deployment
Monitor the Knative deployment.
If Knative Serving has been deployed successfully, all deployments of Knative Serving show a
READY
status. The following code shows a sample output.Check the status of the Knative Serving custom resource.
If Knative Serving is successfully installed, you should see an output similar to the following code.
Step 9. Configure Knative Serving
The Knative Operator manages the configuration of a Knative installation by propagating values from the KnativeServing
and KnativeEventing
custom resources to system ConfigMaps
. Any manual updates to ConfigMaps
are overwritten by the Operator. Knative has multiple ConfigMaps
that are named with the config-
prefix. All Knative ConfigMaps
are created in the same namespace as the custom resource, which is knative-serving
for Knative Serving. The spec.config
in the Knative custom resources has one <name>
entry for each ConfigMap
, named config-<name>
, the value of which is used for the ConfigMap
data
.
Update the following Knative Serving configuration settings.
- Specify the domain suffix for your Knative installation, for example,
knative.example.com
. - Specify the Golang text template string to use when constructing a Knative service's DNS name:
"{{.Name}}-{{.Namespace}}.{{.Domain}}"
. - Enable Init Containers support.
- Enable emptyDir volume support.
- Specify the domain suffix for your Knative installation, for example,
To apply the configuration, use the following command.
Deploy Watson NLP runtime with pretrained models
With Knative Serving installed and configured on your GKE cluster, it's time to deploy the Watson NLP runtime as a Knative Service. In fact, you deploy two Knative service instances: one for gRPC and another for REST. More on that later.
Step 10. Create a namespace
Create a dedicated namespace for deploying the two Knative services, as shown in the following example.
Step 11. Create a Secret for the IBM Entitlement Key
The IBM Entitled Registry contains various container images for the Watson NLP Runtime and pretrained models. You can obtain the entitlement key from the container software library and store it in a Kubenetes Secret resource, which is needed for your deployment to access those images.
The following command creates a secret that is named ibm-entitlement-key
to store your IBM Entitlement Key.
Step 12. Clone the GitHub repository
Run the following command to clone the repository that contains the Knative Service manifests that are used in this tutorial.
Go to the directory of this tutorial.
Step 13. Create the Knative Services
The Watson NLP runtime runs both a gRPC server and a REST server, on port 8085 and port 8080. Because Knative Serving doesn't allow multiple ports in a service, you create two services instead, using the same Watson NLP runtime and models, while each exposing a different port. In both Knative Service manifests, the Watson NLP pretrained model images are specified as Init Containers. These Init Containers run to completion before the main container starts in the Pod. They provision models into an emptyDir volume that is defined in the Pod. When the Watson NLP runtime container starts, it loads the models and begins serving them.
To create a Knative Service resource for the gRPC server, run the following command.
To create a Knative Service resource for the REST server, run the following command.
It might take a few minutes for the watson-nlp-runtime
container to be created. You can check the progress by watching the events.
When a Knative Service is created, Knative creates a set of Kubernetes resources. This includes creating a Knative Configuration resource, which defines the wanted state of the service and an initial Knative Revision based on that configuration. Knative creates a Kubernetes Deployment resource to manage the scaling and replication of the service and a Kubernetes Service resource to expose the service to other components within the Kubernetes cluster. Knative also creates an Istio Virtual Service and Knative Ingress to handle incoming traffic. Autoscaling is handled by Knative Pod Autoscaler (KPA) or by Kubernetes HPA with no scale-to-zero functions. This complex orchestration provides an easy way to manage the deployment, scaling, and routing of a Knative Service, letting you focus on developing and delivering high-quality applications.
Access the Knative Services
With the Watson NLP runtime up and running and its API services ready to accept incoming requests, you can make gRPC and REST calls by using a command-line utility like grpcurl
and curl
.
Step 14. Use grpcurl to make a gRPC call
You can send inference requests to the gRPC service endpoint by using grpcurl
commands. Either a Protocol Buffers source file that is specified by -proto
or a compiled "protoset" file that is specified by -protoset
is needed for making gRPC calls to a gRPC service that doesn't provide gRPC server reflection.
The proto source files can be extracted from the Watson NLP runtime container image as follows.
Make a sample gRPC call.
Tip: The metadata value for mm-model-id
is the Model ID of the pretrained model that is found in the Models catalog.
If you get a response like the following code, the gRPC service is working properly.
Step 15. Use curl to make a REST call
You can send inference requests to the REST service endpoint by using curl
commands.
Make a sample REST call.
Tip: The metadata value for grpc-metadata-mm-model-id
is the Model ID of the pretrained model that is found in the Models catalog.
If you get a response like the following code, the REST service is working properly.
Step 16. Clean up
Don't forget to clean up afterward to avoid paying for the cloud resources that you no longer need.
Delete the GKE cluster.
Summary
In this tutorial, you learned how to serve pretrained Watson NLP models with Knative deployed on a Google Kubernetes Engine cluster.
Take a look at more embeddable AI content on IBM Developer, or try out the IBM Natural Language Processing Library for Embed.