Win $20,000. Help build the future of education. Answer the call. Learn more

Create a conversion webhook with the Operator SDK

Introduction

This tutorial walks you through creating a conversion webhook with the Operator SDK and shows you how to migrate the existing custom resources from the old version to the new version. It uses a CustomResourceDefinition (CRD) to define the new resource brought into the Kubernetes cluster. You can use the CustomResourceDefinition API versions field to support multiple versions of custom resources that you have developed. The CRDs with different versions can have different schemas, and conversion webhooks can convert custom resources between versions.

When you develop a Kubernetes operator to manage its operand, you might have to manage the CRD across different versions, as the operator evolves. In this situation, you need to make sure that your operator supports backwards compatibility at the API level, which means the old schema of the CRD still works or can be converted into the new schema. For more information about different versions in the CRD, refer to Versions in CustomResourceDefinitions in the Kubernetes documentation.

In this tutorial, you:

  1. Build the latest Operator SDK.
  2. Install cert-manager for certificate management.
  3. Create a project with different CRD versions.
  4. Migrate the existing resources.
  5. See how the migration works.

Prerequisites

To replicate the steps of this tutorial, you need to install the following tools:

  • Golang: An open source programming language. (Configure your $GOPATH variable to add $GOPATH/bin into the PATH.)
  • Git: A distributed version control system for your project.
  • Ko: An image builder for Go applications.
  • Kubectl: A Kubernetes command-line tool that runs commands against the Kubernetes cluster.

In addition, you need to set up a Kubernetes cluster v1.16 or newer as your environment to install Knative Operator. You can use a Kubernetes service from any major cloud provider, such as IBM Cloud Kubernetes Service. If you choose to use a local Kubernetes cluster on your own machine, you can select Minikube or Docker desktop, depending on your operating system.

Estimated time

Based on your familiarity with the Operator SDK, it will take 15 to 20 minutes for you to go through the steps in this tutorial.

Build and install the Operator SDK

You could install the official Operator SDK by following the Operator SDK CLI installation instructions. However, I recommend that you build the binary based on the latest commit so that you have the most up-to-date Operator SDK, which can help you avoid having obsolete issues.

Open a terminal, and create the directory at $GOPATH/src/github.com/operator-framework (if it’s not there already):

mkdir $GOPATH/src/github.com/operator-framework

Go to the directory that you created:

cd $GOPATH/src/github.com/operator-framework

Download the source code of operator-sdk:

git clone git@github.com:operator-framework/operator-sdk.git

Go to the home directory of the operator-sdk:

cd operator-sdk

Build and install the Operator SDK:

make install

The operator-sdk binary, placed in $GOPATH/bin/operator-sdk, is able to run commands beginning with operator-sdk.

Install cert-manager

The Kubernetes add-on, cert-manager, automatically manages and issues TLS certificates from various issuing sources. It ensures certificates are valid and updated periodically and attempts to renew certificates at an appropriate time before expiration. The webhook in Kubernetes requires a TLS certificate that the API server trusts, so you need to set up cert-manager to issue and manage the TLS certificate. Install the latest version of cert-manager with the following command:

kubectl apply -f https://github.com/jetstack/cert-manager/releases/latest/download/cert-manager.yaml

In the example in the next section, you create two CRD versions to see how the webhook can convert the custom resource from one to the other. You can name your CRD version after any name, but this example uses v1alpha1 for the first CRD version and v1beta1 for the second CRD version.

Create the project with the CRD version v1alpha1

Create a working directory under $GOPATH/src for the project. This tutorial picks up the path github.com/example.

mkdir $GOPATH/src/github.com/example
cd $GOPATH/src/github.com/example
mkdir memcached-operator
cd memcached-operator

Initialize the project with Git:

git init

Initialize the project with operator-sdk (you can change the domain and repo names based on your needs):

operator-sdk init --domain example.com --repo github.com/example/memcached-operator

This command generates the following:

  • A go.mod file that has project dependencies
  • A PROJECT file that stores project configuration
  • A Makefile that has several useful make targets for the project
  • Several YAML files for project deployment under the config directory
  • A main.go file, which creates the manager that runs the project controllers

Create a new API and controller for the CRD named Memcached at v1alpha1:

operator-sdk create api --group cache --version v1alpha1 --kind Memcached --resource –controller

This command scaffolds the Memcached resource API at api/v1alpha1/memcached_types.go and the controller at controllers/memcached_controller.go.

After running the previous commands, you should have the project structure as described above:

Memcached project structure

Next, let’s change the file api/v1alpha1/memcached_types.go. Change the MemcachedSpec and MemcachedStatus for the Memcached CR as below:

// MemcachedSpec defines the desired state of Memcached
type MemcachedSpec struct {
     //+kubebuilder:validation:Minimum=0
     // Size is the size of the memcached deployment
     Size int32 `json:"size"`
}

// MemcachedStatus defines the observed state of Memcached
type MemcachedStatus struct {
     // Nodes are the names of the memcached pods
     Nodes []string `json:"nodes"`
}

To update the file api/v1alpha1/zz_generated.deepcopy.go, invoke the controller-gen utility in the Makefile located under the root directory of your project:

make generate

Generate the CRD manifests, including WebhookConfiguration, ClusterRole, and CustomResourceDefinition objects for this project:

make manifests

In the config/crd/bases directory, you should now see the cache.example.com_memcacheds.yaml file, and under the config/crd/patches directory, you should see two YAML files: cainjection_in_memcacheds.yaml and webhook_in_memcacheds.yaml:

Manifest YAML files

Because the focus of this tutorial is on the conversion of the webhook, this example uses a simplified controller design. In general, the only thing you need to know about in the controller is the Reconcile function. It does nothing except print a message, and the controller only watches for changes on the newly created CR and the deployments it owns.

In the controllers/memcached_controller.go file, change the Reconcile function into:

//+kubebuilder:rbac:groups=cache.example.com,resources=memcacheds,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=cache.example.com,resources=memcacheds/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=cache.example.com,resources=memcacheds/finalizers,verbs=update
//+kubebuilder:rbac:groups=core,resources=pods,verbs=get;list
func (r *MemcachedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
     log := ctrl.Log.WithValues("memcached", req.NamespacedName)
     log.Info("Memcached resource has changed.")
     latest := &cachev1alpha1.Memcached{}
     err := r.Get(ctx, req.NamespacedName, latest)
     if err != nil {
         if errors.IsNotFound(err) {
             log.Info("Memcached resource not found. Ignoring since object must be deleted")
             return ctrl.Result{}, nil
         }
         // Error reading the object - requeue the request.
         log.Error(err, "Failed to get Memcached")
         return ctrl.Result{}, err
     }

     // Create an array of three pods: pod1, pod2 and pod3, and set the nodes with the array in the status.
     podNames := []string{"pod1", "pod2", "pod3"}
     latest.Status.Nodes = podNames
     err = r.Status().Update(ctx, latest)
     if err != nil {
         log.Error(err, "Failed to update Memcached status")
         return ctrl.Result{}, err
     }
     return ctrl.Result{}, nil
}

The pod names are hardcoded here for testing purpose only. They are random names used to fill in the nodes field in the status. In fact, this function does not really create any pods. It just demonstrates a fake and easy implementation of the Reconcile function. In addition, it specifies permissions and generates RBAC manifests by adding the +kubebuilder annotations above the function Reconcile.

In controllers/memcached_controller.go, change the SetupWithManage function to the following:

func (r *MemcachedReconciler) SetupWithManager(mgr ctrl.Manager) error {
     return ctrl.NewControllerManagedBy(mgr).
         For(&cachev1alpha1.Memcached{}).
         WithOptions(controller.Options{MaxConcurrentReconciles: 2}).
         Complete(r)
}

Because you just added the RBAC permissions, you need to generate the ClusterRole manifest at config/rbac/role.yaml. Run the following command again:

make manifests

Update the CR sample at config/samples/cache_v1alpha1_memcached.yaml to the following:

apiVersion: v1
kind: Namespace
metadata:
   name: memcached-sample
---
apiVersion: cache.example.com/v1alpha1
kind: Memcached
metadata:
   name: memcached-sample
   namespace: memcached-sample
spec:
   size: 3

So far, you have created the basic structure of the memcached operator for the v1alpha1 version CRD. Let’s save the work with Git. Add the following folders and files:

git add api
git add config
git add controllers
git add hack
git add main.go
git add Dockerfile
git add go.mod
git add go.sum
git add PROJECT
git add Makefile
git add .dockerignore
git add .gitignore

Use the git command to save the commit:

git commit -a

Create a branch to save the work of v1alpha1 CRD:

git checkout -b v1alpha1

Build and test the operator

You can choose any image repository to save the images. The following steps use docker.io as the image repository.

Log onto docker first with the following command:

docker login

Specify the variable $USER and build the image, replacing with your name registered with docker.io. Tag v0.0.1 is used for the v1alpha1 CRD:

export USER=<name>
make docker-build docker-push IMG=docker.io/$USER/memcached-operator:v0.0.1

After you successfully publish the image, run the following command to deploy the operator:

make deploy IMG=docker.io/$USER/memcached-operator:v0.0.1

As defined in the config/default/kustomization.yaml file, the default namespace for the operator is memcached-operator-system. Check the deployment of the operator with:

kubectl get deploy -n memcached-operator-system

Check the log of the operator with the following command:

kubectl logs -f deploy/memcached-operator-controller-manager -n memcached-operator-system -c manager

The deployment launches two containers: manager and proxy. You need to focus on manager to track all the log messages of the operator. That is why you need to specify the container name when you try to show and track the log messages.

Create the v1alpha1 CR with the following command:

kubectl apply -f config/samples/cache_v1alpha1_memcached.yaml

Once you run this command, you should see the following message in the log of the operator:

INFO    controllers.Memcached    Memcached resource has changed.    {"memcached": "memcached-sample/memcached-sample"}

This means that the reconcile loop is called as the CR is created. Check the CR with the following command:

kubectl get Memcached memcached-sample -n memcached-sample -oyaml

This command yields the contents of the CR in YAML format. You have everything for the v1alpha1 CRD in place. Use the following command to remove the CR:

kubectl delete Memcached memcached-sample -n memcached-sample

Use this command to remove the operator:

make undeploy IMG=docker.io/$USER/memcached-operator:v0.0.1

Create a new CRD version

Switch back to the master branch and continue the development:

git checkout master

Create the new v1beta1 API:

operator-sdk create api --group cache --version v1beta1 --kind Memcached

You do not need to create the controller this time because you have already had one. You only need the resource. The only thing you need to change is to allow the controller to reconcile based on the v1beta1 CR, not the v1alpha1 CR anymore. To develop a Kubernetes operator, try to avoid reconciling on multiple versions of the same CRs for the controller.

Create a field called replicaSize for the v1beta1 CRD. This is the crucial and only change compared to v1alpha1 CRD. Let’s change the file api/v1beta1/memcached_types.go.

Define the API for the Memcached CR as shown here:

// MemcachedSpec defines the desired state of Memcached
type MemcachedSpec struct {
     //+kubebuilder:validation:Minimum=0
     // ReplicaSize is the size of the memcached deployment
     ReplicaSize int32 `json:"replicaSize"`
}

// MemcachedStatus defines the observed state of Memcached
type MemcachedStatus struct {
     // Nodes are the names of the memcached pods
     Nodes []string `json:"nodes"`
}

Add the marker +kubebuilder:storageversion to indicate v1beta1 is the storage version:

//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
//+kubebuilder:storageversion

// Memcached is the Schema for the memcacheds API
type Memcached struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`

    Spec   MemcachedSpec   `json:"spec,omitempty"`
    Status MemcachedStatus `json:"status,omitempty"`
}

In the file controllers/memcached_controller.go, replace all instances of cachev1alpha1 with cachev1beta1 and all instances of v1alpha1 with v1beta1. Make sure the controller switches to watch the v1beta1 CR.

Then, update the generated code and regenerate the CRD manifests:

make generate
make manifests

Both v1alpha1 and v1beta1 CRDs are serving, but only the v1beta1 CRD is stored.

Change the CR sample at config/samples/cache_v1beta1_memcached.yaml into the following contents:

apiVersion: v1
kind: Namespace
metadata:
  name: memcached-sample
---
apiVersion: cache.example.com/v1beta1
kind: Memcached
metadata:
  name: memcached-sample
  namespace: memcached-sample
spec:
  replicaSize: 3

Create the conversion webhook for the v1beta1 resource:

operator-sdk create webhook --conversion --version v1beta1 --kind Memcached --group cache --force

Next, you need to implement the conversion.Hub and conversion.Convertible interfaces for your CRD types. You leverage the v1beta1 as the storage version and the Hub, which means any resource version can convert into v1beta1. Create a file named memcached_conversion.go under api/v1beta1 with the following contents:

package v1beta1
// Hub marks this type as a conversion hub.
func (*Memcached) Hub() {}

The v1alpha1 resource needs to implement the conversion.Convertible interface so that it is able to convert into and from the v1beta1 resource. In this example, the attribute size in v1alpha1 matches the replicaSize inv1beta1.

Create a file named memcached_conversion.go under api/v1alpha1 with the following contents:

package v1alpha1

import (
     "github.com/example/memcached-operator/api/v1beta1"
     "sigs.k8s.io/controller-runtime/pkg/conversion"
)

// ConvertTo converts this Memcached to the Hub version (vbeta1).
func (src *Memcached) ConvertTo(dstRaw conversion.Hub) error {
     dst := dstRaw.(*v1beta1.Memcached)
     dst.Spec.ReplicaSize = src.Spec.Size
     dst.ObjectMeta = src.ObjectMeta
     dst.Status.Nodes = src.Status.Nodes
     return nil
}

// ConvertFrom converts from the Hub version (vbeta1) to this version.
func (dst *Memcached) ConvertFrom(srcRaw conversion.Hub) error {
     src := srcRaw.(*v1beta1.Memcached)
     dst.Spec.Size = src.Spec.ReplicaSize
     dst.ObjectMeta = src.ObjectMeta
     dst.Status.Nodes = src.Status.Nodes
     return nil
}

Do not forget to set the ObjectMeta. Update the generated code and regenerate the CRD manifests once more with the commands make generate and make manifests.

Enable the webhook and the certificate manager in manifests

Go through a few kustomization.yaml files under config/crd, config/default, and config/webhook and perform a few uncommenting and commenting actions.

For config/crd/kustomization.yaml, uncomment the following lines:

#- patches/webhook_in_memcacheds.yaml
#- patches/cainjection_in_memcacheds.yaml

For config/default/kustomization.yaml, uncomment the following lines:

#- ../webhook
#- ../certmanager
#- manager_webhook_patch.yaml

And all the line below vars:

#- name: CERTIFICATE_NAMESPACE # namespace of the certificate CR
#  objref:
#    kind: Certificate
#    group: cert-manager.io
#    version: v1
#    name: serving-cert # this name should match the one in certificate.yaml
#  fieldref:
#    fieldpath: metadata.namespace
#- name: CERTIFICATE_NAME
#  objref:
#    kind: Certificate
#    group: cert-manager.io
#    version: v1
#    name: serving-cert # this name should match the one in certificate.yaml
#- name: SERVICE_NAMESPACE # namespace of the service
#  objref:
#    kind: Service
#    version: v1
#    name: webhook-service
#  fieldref:
#    fieldpath: metadata.namespace
#- name: SERVICE_NAME
#  objref:
#    kind: Service
#    version: v1
#    name: webhook-service

For config/webhook/kustomization.yaml, comment out the following line:

- manifests.yaml

Change the file config/crd/patches/webhook_in_memcacheds.yaml by adding conversionReviewVersions: [“v1alpha1”, “v1beta1”] under spec.conversion.webhook like this:

# The following patch enables a conversion webhook for the CRD
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
   name: memcacheds.cache.example.com
spec:
   conversion:
     strategy: Webhook
     webhook:
       conversionReviewVersions: ["v1alpha1", "v1beta1"]
       clientConfig:
         service:
           namespace: system
           name: webhook-service
           path: /convert

Save the v1beta1 work with Git:

git add api
git add config

Commit the change:

git commit –a

Save them into a new branch:

git checkout –b v1beta1

Build the image for v1beta1:

export USER=<name>
make docker-build docker-push IMG=docker.io/$USER/memcached-operator:v0.0.2

Replace with your name registered with docker.io. I use the tag v0.0.2 for the v1beta1 CRD. Deploy the operator with the v1beta1 resource:

make deploy IMG=docker.io/$USER/memcached-operator:v0.0.2

Check the deployment of the operator with the following:

kubectl get deploy -n memcached-operator-system

Check the log with the following command:

kubectl logs -f deploy/memcached-operator-controller-manager -n memcached-operator-system -c manager

Still create the v1alpha1 CR with the command:

kubectl apply -f config/samples/cache_v1alpha1_memcached.yaml

This time, you get the v1beta1 resource save in the cluster. The v1alpha1 resource is automatically converted into v1beta1. Check the CR:

kubectl get Memcached memcached-sample -n memcached-sample -

And probably get something like:

apiVersion: cache.example.com/v1beta1
kind: Memcached
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"cache.example.com/v1alpha1","kind":"Memcached","metadata":{"annotations":{},"name":"memcached-sample","namespace":"memcached-sample"},"spec":{"size":3}}
  creationTimestamp: "2021-07-08T18:14:08Z"
  generation: 1
  name: memcached-sample
  namespace: memcached-sample
  resourceVersion: "40912"
  uid: e2fc8026-5cbc-4340-a00d-f3d315b831d3
spec:
  replicaSize: 3
status:
  nodes:
  - pod1
  - pod2
  - pod3

The array of three pod names: pod1, pod2, and pod3 is shown here in the status.nodes, because you specify them in the Reconcile function of the file. The resource is saved in the cluster in terms of v1beta1, though created by v1alpha1. The field size: 3 of v1alpha1 was converted into replicaSize: 3 of v1beta1.

Migrate the existing v1alpha1 resource into the v1beta1 resource

In the CRD, you define the storage version, but it only applies to the new resource creation. To deal with the resources of the older versions that already existed in the cluster, you can follow the official Kubernetes documentation. However, there is another way that is easier and automated. You can migrate the existing resources with the following tool, available in the Knative common package: knative.dev/pkg/apiextensions/storageversion/cmd/migrate.

Switch back to master branch:

git checkout master

Add the knative.dev/pkg v0.0.0-20210706174620-fe90576475ca into the file go.mod. Then, create a directory called post-install under config/ to host all the YAML files regarding the migration. Create config/post-install/tools.go with the following contents:

// +build tools

package tools

import (
   // Needed for the storage version too.
   _ "knative.dev/pkg/apiextensions/storageversion/cmd/migrate"
)

The purpose of this file is to import the migration library. Create config/post-install/clusterrole.yaml with the following contents:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: memcached-operator-post-install-job-role
rules:
  # Storage version upgrader needs to be able to patch CRDs.
   - apiGroups:
       - "apiextensions.k8s.io"
     resources:
       - "customresourcedefinitions"
       - "customresourcedefinitions/status"
     verbs:
       - "get"
       - "list"
       - "update"
       - "patch"
       - "watch"
  # Our own resources we care about.
   - apiGroups:
        - "cache.example.com"
     resources:
        - "memcacheds"
     verbs:
        - "get"
        - "list"
        - "create"
        - "update"
        - "delete"
        - "patch"
        - "watch"

Create config/post-install/serviceaccount.yaml with the following contents:

apiVersion: v1
kind: ServiceAccount
metadata:
   name: memcached-operator-post-install-job
   namespace: memcached-operator-system

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
   name: memcached-operator-post-install-job-role-binding
subjects:
   - kind: ServiceAccount
     name: memcached-operator-post-install-job
     namespace: memcached-operator-system
roleRef:
   kind: ClusterRole
   name: memcached-operator-post-install-job-role
   apiGroup: rbac.authorization.k8s.io

Create config/post-install/storage-version-migrator.yaml with the following contents:

apiVersion: batch/v1
kind: Job
metadata:
   name: storage-version-migration
   namespace: memcached-operator-system
   labels:
     app: "storage-version-migration"
spec:
   ttlSecondsAfterFinished: 600
   backoffLimit: 10
   template:
     metadata:
       labels:
         app: "storage-version-migration"
     spec:
       serviceAccountName: memcached-operator-post-install-job
       restartPolicy: OnFailure
       containers:
          - name: migrate
            image: docker.io/houshengbo/migrate:0.0.2
            args:
               - "memcacheds.cache.example.com"

Specify the arg memcacheds.cache.example.com directly for this tool, so it does the conversion.

Generate the dependencies for the project:

go mod vendor

Build the image for the migration tool:

ko resolve -f config/post-install -B -t 0.0.2

You also tag it with 0.0.2 because it is the conversion to v1beta1. The image will be published at docker.io/$USER/migrate:0.0.2.

Replace image: ko://github.com/example/memcached-operator/vendor/knative.dev/pkg/apiextensions/storageversion/cmd/migrate with image:docker.io/$USER/migrate:0.0.2 in the file config/post-install/storage-version-migrator.yaml.

The job is ready for the resource migration from v1alpha1 to v1beta1.

Save the work with Git:

git add vendor
git add config
git commit -a

Save it into another branch called v1beta1-with-migrator:

git checkout -b v1beta1-with-migrator

See how the resource migration works

Make sure the Kubernetes cluster has a clean environment with no memcached operator installed or a fresh new cluster to run the following steps, but do not forget to install the cert-manager.

Go to the v1alpha1 branch:

git checkout v1alpha1

Install the operator with the v1alpha1 resource:

make deploy IMG=docker.io/$USER/memcached-operator:v0.0.1

Create the v1alpha1 resource:

kubectl apply -f config/samples/cache_v1alpha1_memcached.yaml

Now, you have got the v1alpha1 resource saved in the cluster with the v1alpha1 memcached operator. Verify the CR with the following command:

kubectl get Memcached memcached-sample -n memcached-sample -oyaml

You should save it as v1alpha1.

Go to the v1beta1-with-migrator branch:

git checkout v1beta1-with-migrator

Install the operator with the v1beta1 resource:

make deploy IMG=docker.io/$USER/memcached-operator:v0.0.2

The older version of the memcached operator is replaced with the newer version, but it is still the v1alpha1 resource saved in the cluster.

Check the status of the CRD:

kubectl get crd memcacheds.cache.example.com -oyaml

You can see the storage version:

status:
  ...
  storedVersions:
  - v1alpha1
  - v1beta1

Run the following command to migrate the resource:

kubectl apply -f config/post-install

Check the status of the CRD:

kubectl get crd memcacheds.cache.example.com -oyaml

You can see the storage version has changed into the following:

status:
  ...
  storedVersions:
  - v1beta1

Let’s check the CR:

kubectl get Memcached memcached-sample -n memcached-sample -oyaml

Be aware that the CR does not change immediately after the migration job is complete. It might take a few minutes to accomplish the transition. Once the migration is done for the CR, you can get the CR as shown here:

apiVersion: cache.example.com/v1beta1
kind: Memcached
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"cache.example.com/v1alpha1","kind":"Memcached","metadata":{"annotations":{},"name":"memcached-sample","namespace":"memcached-sample"},"spec":{"size":3}}
  creationTimestamp: "2021-07-08T18:59:13Z"
  generation: 1
  name: memcached-sample
  namespace: memcached-sample
  resourceVersion: "48456"
  uid: 2b05b0ba-a823-4a4b-b53b-a59a1e6307cb
spec:
  replicaSize: 3
status:
  nodes:
  - pod1
  - pod2
  - pod3

The size: 3 in v1alpha is converted into replicaSize: 3 in v1beta1.

This is how you create the conversion webhook with operator-sdk to convert resources among different versions and how you can migrate existing resources from the old version to the new version.

Summary

When you come across multiple versions of CRDs, you need to leverage the conversion webhook to convert the custom resources from the old version to the new, and vice versa. With the Operator SDK, you can create the conversion webhook for the CRDs across multiple versions. With the migration tool in the Knative common package, you can migrate the existing custom resources to the new version automatically.