Dive into operators, Part 3: Design and create operators based on the controller runtime

The open source Operator Framework toolkit manages Kubernetes-native applications–called Operators–in a more effective, automated, and scalable way. This tutorial gives you a thorough introduction to the Operator Framework, including the Operator SDK which is a developer toolkit, the Operator Registry, and the Operator Lifecycle Manager (OLM). The OLM takes care of the lifecycle of operators, including the updates to the operators and their resources. The OLM also is part of the OpenShift 4.x Container Platform.

This tutorial explains how to use the Operator SDK and OLM to develop and install operators. It walks through steps of developing an operator, including how to create the operator template, construct the reconciling logic in Ansible, build the operator image, register the operator, deploy the operator, and then install the application through the operator on IBM Kubernetes and OpenShift clusters.

This tutorial is the third part of a four-part series examining the use of operators. Part 1 provides an architectural overview of operators and Part 2 shows you how to pass configurations to Kubernetes operators with kustomize.


Before you walk through this tutorial, you need to set up the following environment:

  • Set up a cloud and Kubernetes environment like the IBM Cloud Kubernetes Service.
  • If you are going to install an operator through the OLM (step 7. e.), set up an OpenShift Container Platform 4.1
  • Install Operator SDK, as described in Operator SDK Installation.
  • Install git so you have access to this tutorial’s github repo. You run following command to clone the repo: git clone https://github.com/adrian555/ofip.git

Estimated time

Completing this tutorial should take approximately 30 minutes.

Basic concepts when working with operators

Operators were introduced in 2016 by CoreOS. Operators are a method of packaging, deploying, and managing a Kubernetes application. An operator has its custom controller watching the custom resources specifically defined for the applications. Therefore, an operator mainly consists of Kubernetes CustomResourceDefinitions (CRDs) and controller logic. Operators extend the Kubernetes API with CRDs. When you create a new CRD, the Kubernetes API Server creates a new set of RESTful endpoints, for example, /apis/os.ibm.com/v1alpha1/namespaces/default/SparkCluster, to manage the custom resource. The operator itself is also a Kubernetes application. So, to run an operator on an OpenShift cluster is to create a Kubernetes deployment: you deploy an operator pod that can run one or more containers with pre-built images.

With operators, it is easy to manage complex stateful applications. A database application is one example of stateful applications. “Day 2” activities, such as patching, updating, upgrading, and scaling, are other good reasons to manage applications with operators.

The Operator Framework offers an open source toolkit to build, test, package operators and manage lifecycle of operators. It includes the following tools:

This tutorial covers the two main components operator-sdk and operator-lifecycle-manager, which are essential for creating and managing operators. The operator-sdk provides an SDK for building operators, and the operator-lifecycle-manager provides a service to discover, install, and manage operators.

An operator watches on the custom resources through CRDs. The OLM runs a version of operator with a ClusterServiceVersion, which contains all resources, including CRDs and role-based access control rules to run the operator.

Create a SparkCluster operator with the Operator SDK

In this tutorial you build an operator to create a stand-alone Spark cluster running on an OpenShift 4.1 cluster. A Spark stand-alone cluster consists of one or more master and worker nodes. Each node runs the Spark binary in a daemon process. The master node provides the SPARK_MASTER URL for Spark drivers to submit applications to run on. The worker node receives jobs from the master and spawns exectutors to run the tasks. The application managed by the operator creates such a Spark cluster.

Complete the following steps to build and install the Spark operator:

  1. Build Spark Docker image.

    Because you are running on an OpenShift Kubernetes cluster, you must first prepare the container image for the Spark master and worker nodes. This tutorial uses Spark’s docker-image-tool to build and push the Docker image, with some tweaks to install pyspark and make the image for both master and worker node.

     # download the original Spark binary
     wget http://ftp.wayne.edu/apache/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
     tar zxvf spark-2.4.4-bin-hadoop2.7.tgz
     # modify Dockerfile
     cd spark-2.4.4-bin-hadoop2.7
     cp ../patch/run.sh sbin
     cp ../patch/Dockerfile kubernetes/dockerfiles/spark
     # build and push the docker image
     # NOTE: replace the repository with the one you owned
     ./bin/docker-image-tool.sh -r docker.io/dsml4real -t v2.4.4 build
     ./bin/docker-image-tool.sh -r docker.io/dsml4real -t v2.4.4 push
     cd ..
  2. Create the Spark operator.

    Operator SDK can help you build one of the following types of operators: Go, Helm and Ansible. The difference between these three types of operators is the maturity of an operator’s encapsulated operations:

    Operator maturity model

    The operator maturity model is from the OpenShift Container Platform document.

    This tutorial chooses Ansible as the mechanism to write the controller logic for the Spark operator.

    You can install the Operator SDK brew on macOS or download it as binary.

    After the SDK is installed, run following command to create the operator:

     mkdir temp
     operator-sdk new spark-operator --api-version=ibm.com/v1alpha1 --kind=Spark --type=ansible
     cd ..

    This command creates the scaffolding code for the operator under the spark-operator directory, including the manifests of CRDs, example custom resource, the role-based access control role and rolebinding, and the Ansible playbook role and tasks. It also creates the Dockerfile to build the image for the operator. The directory structure and contents are similar to the example included in the repo.

    After an Ansible type of operator is installed on the OpenShift cluster, a pod is created and runs with two containers: ansible and operator. The ansible-runner process runs the role or playbook provided in the spark-operator/roles/spark directory. For details, see operator-framework/operator-sdk.

  3. Add Ansible tasks to install a Spark cluster.

    The application managed by the operator installs a Spark cluster. You can install a Spark cluster through Ansible playbook tasks. This tutorial already implements the code in the spark-operator/roles/spark directory.

    The main tasks are in the tasks/main.yml file. It creates a spark-master Kubernetes deployment to run a Spark master pod, a spark-worker Kubernetes deployment to run one or more Spark worker pods, and a spark-cluster Kubernetes service to access the Spark driver requests to the Spark master. You can also use it to create a spark-worker-pvc Kubernetes PersistentVolumeClaim which the Spark worker pods use if the access to a distributed file system (DFS) server is provided. You can use the worker-size to specify the number of pods created by the spark-worker deployment. This parameter is passed in through the CustomResource when you create a Spark cluster from the Spark operator.

    It is worth mentioning that these tasks are the same when you install a Spark cluster without an operator. All this tutorial step does is rewrite them to use with the Ansible automation tool.

  4. Update role-based access control roles.

    The Operator SDK command creates a spark-operator service account with a specific role and role binding. This service account installs and manages the application. And the role defines the role-based access control restriction for the service account. For security, carefully choose the roles to be granted for this service account. Role-based access control provides authorization of given actions to the service account. If an operator is not supposed to delete certain resources, do not give the delete action. Consider another situation, where an operator has dependencies: one of them might perform an unexpected action from version to version, and you want to know the action is not allowed.

    Also, if an operator is watching resources from other namespaces across the OpenShift cluster, you might need to choose appropriate cluster roles instead. This tutorial just binds the cluster-admin role to the spark-operator service account for simplification purpose.

  5. Create the Spark manifest.

    An operator creates and watches custom resource definitions. These CRDs are saved in the spark-operator/deploy/crds directory. To create an instance of the CRD, you need to create a manifest file. You can specify parameters for the custom resource as well.

    The Spark operator in this tutorial creates the Spark custom resource. One example of the manifest to create an application of the Spark custom resource is the ibm_v1alpha1_spark_pv_cr.yaml file.

  6. Build the Docker image for operator and update operator deployment to use the image.

    The Operator SDK command already generates a Dockerfile for the operator image. Run the following command to build the Docker image:

     cd spark-operator
     # build the docker image
     operator-sdk build dsml4real/spark-operator:v0.0.1
     # push the docker image
     docker push dsml4real/spark-operator:v0.0.1
     cd ..

    As mentioned earlier, the operator code is run through a deployment defined in the spark-operator/deploy/operator.yaml file. Update that file and make sure that the image for both ansible and operator containers run with the image you built earlier.

    So far, you have prepared the artifacts for the application: a stand-alone Spark cluster and an Ansible type of operator named spark-operator (with the Operator SDK tool). You added the CRDs and controller logic to deploy a Spark cluster, and you built the docker image for the operator. These are the common steps for writing an operator. Now the Spark operator is ready. You can either test the operator locally or install it on the cluster.

  7. Install Spark operator.

    There are two approaches to install an operator. One is to run some OpenShift client oc commands to create service account, role binding and then the operator itself. The other one is to package the resources into a ClusterServiceVersion and then let Operator Lifecycle Manager (OLM) install.

    To Install manually, use the following commands install the Spark operator:

     cd spark-operator
     # create a new project (optional)
     oc new-project tutorial
     oc adm policy add-scc-to-user anyuid -z default
     # create CRDs
     oc apply -f deploy/crds/ibm_v1alpha1_spark_crd.yaml
     # create service account
     oc apply -f deploy/service_account.yaml
     # create role and role binding
     oc apply -f deploy/role.yaml
     oc apply -f deploy/role_binding.yaml
     # create the operator deployment
     oc apply -f deploy/operator.yaml
     cd ..

    A pod like the following example is created:

     oc get pods -n tutorial
     ### NAME                              READY   STATUS    RESTARTS   AGE
     ### spark-operator-7477ff4c94-lgb6z   2/2     Running   0          40s

    To check the progress, run following commandto view the logs from the operator container of the spark-operator pod:

     kubectl logs deployment/spark-operator operator -n tutorial -f

    To install through the OLM, two steps are required. First, generate a ClusterServiceVersion manifest to include the metadata, CRDs, and installation strategy for the operator. Secondly it must register itself to a catalog for the OLM to discover.

    OLM and catalog source: Add operators to registry and use in catalog source

    As illustrated in the figure, you must complete several steps:

    a) Generate ClusterServiceVersion.

    The Operator SDK tool also helps generate the ClusterServiceVersion. Run following command:

     cd spark-operator
     operator-sdk olm-catalog gen-csv --csv-version 0.0.1 --update-crds
     cd ..

    The ClusterServiceVersionmanifest is generated in the spark-operator/deploy/olm-catalog directory. The directory contains different versions of ClusterServiceVersions and dependent CRDs, together with a package manifest describing the channels for different installation path for the application (such as alpha or stable). Use it in the Subscription manifest when creating or upgrading the CustomResource (for example, the application).

    You can update the ClusterServiceVersion file with metadata such as adding an icon to the operator. This tutorial comes with the updated version of ClusterServiceVersion in the spark-operator/deploy/olm directory.

    b) Verify the ClusterServiceVersion.

    Now you verify the ClusterServiceVersion with the operator-courier tool, part of the Operator Framework. You can install operator-courier as s pip package by running the following command:

     pip3 install operator-courier

    Run the following command to validate the ClusterServiceVersion.

     cd spark-operator/deploy/olm
     operator-courier verify spark-operator
     cd -

    c) Build the Docker image for the operator-registry.

    The operator-registry is also part of the Operator Framework. It provides operator catalog data to the OLM by running a registry server on a certain port for OLM to discover the operator.

    To build the docker image for operator registry, run the following commands:

     cd spark-operator/deploy/olm
     # copy operator manifests to a directory for Dockerfile
     mkdir operators
     cp -r spark-operator operators
     # build the image
     docker build . -t dsml4real/spark-operator-registry:v0.0.1
     # push the image
     docker push dsml4real/spark-operator-registry:v0.0.1
     cd -

    d) Create a CatalogSource.

    CatalogSource is a Kubernetes resource containing a catalog of operators that can be installed through OLM. Catalog source runs the operator registry server as a service exposed on port 50051. It runs with the docker image built above and initializes a sqlite database local to the container for data querying. This allows OLM to discover and query the operators registered in the catalog.

    A catalog source manifest looks like the following example yaml file:

     apiVersion: operators.coreos.com/v1alpha1
     kind: CatalogSource
       name: spark-operator-catalog
       namespace: openshift-operator-lifecycle-manager
       sourceType: grpc
       image: dsml4real/spark-operator-registry:v0.0.1
       imagePullPolicy: Always
       displayName: Tutorial Operators
       publisher: IBM

    The namespace field specifies the namespace where the catalog source runs. To deploy, run the following commands:

     cd spark-operator/deploy/olm
     oc apply -f catalogsource.yaml
     cd -

    To check the progress, run the following commands:

     oc get all -n openshift-operator-lifecycle-manager|grep spark-operator-catalog

    After the catalog service is up and running, run following command to verify that the spark-operator operator is shown in the packages for the OLM:

     oc get packagemanifest -n openshift-operator-lifecycle-manager

    You are creating the catalog source in the openshift-operator-lifecycle-manager so that you can use the built-in olm-operators called OperatorGroup in this namespace to install the Spark operator. You can also create the catalog source in other namespace such as openshift-marketplace. But you must create an OperatorGroup there. Future tutorials will cover OperatorGroup topics.

    e) Install the Spark operator through OpenShift console.

    Now you can see the Spark operator shows in the OpenShift console like follow

    Spark operator in OpenShift console

    To install the operator, switch to the openshift-operators namespace and click on the Create Subscription. Follow the instructions to create the Spark operator.

    Installed Spark operator in OpenShift console

    To summarize, there are two approaches to install an operator. Installing through OLM requests more preparation steps but users can benefit from long run as the operator is managed and monitored by the OLM.

  8. Create a Spark cluster.

    Finally, you create a Spark cluster using the operator you installed in the previous step. There are two approaches. Because the Spark operator creates a custom resource Spark in the cluster, creating a Spark cluster basically means creating an instance of Spark.

    To Install manually, you just need to create a manifest to use the Spark custom resource. A sample manifest is provided by the Operator SDK when the operator is created. Because you added some extra parameters to be used by the Ansible tasks, you need to add those parameters in the manifest.

    You the Spark cluster with this manifest:

     cd spark-operator
     oc apply -f deploy/crds/ibm_v1alpha1_spark_pv_cr.yaml
     cd ..

    To check the progress and the output of the Ansible tasks, run the following command:

     kubectl logs deployment/spark-operator operator -n openshift-operators -f

    Also run the the following commands to make sure the pods and service for the Spark cluster are running:

     oc get pods |grep spark
     ### spark-master-7bc49bc8f-rfjd8    1/1     Running   0          2m19s
     ### spark-worker-86466967fd-dq42s   1/1     Running   0          2m17s
     oc get svc |grep spark
     ### spark-cluster   NodePort   <none>        7077:31687/TCP,8080:32142/TCP   2m27s

    To Install with the console, because the Spark operator was installed through OLM, you can create the application from the operator with a few clicks.

    From the OpenShift console, switch to the namespace where you want to create the Spark cluster. Then click the spark-operator operator, and you see following page:

    Install Spark cluster through operator: Create Spark instance through console

    Click the Create New and follow the instruction to create a Spark cluster.

    Spark cluster created: View the Spark instance.

    The example-spark is the instance name of the Spark custom resource created by the Spark operator. You can also verify that there are spark-master and spark-worker pods and a spark-cluster service running in the cluster.


This tutorial provided a step-by-step guide to using the Operator Framework open source project to create, install, and deploy an operator and the application managed by the operator. You can see that the Operator SDK and OLM are great toolsets to assist you in developing and managing operators in Kubernetes.

If you are interested in testing the use of the Spark cluster you created above, see the example jupyter notebook on GitHub. Before running it, replace the openshift-cluster-hostname with the node’s hostname where the spark-cluster service is running.