Dive into operators, Part 3: Design and create operators based on the controller runtime
Learn to use the Operator Framework, an open source toolkit for managing operators
Operators are an effective and efficient approach for managing applications. Operators are also Kubernetes applications. In the previous part of this series, Pass configuration to Kubernetes operators with kustomize, you saw an example of a single configuration, and you learned how to use kustomize to patch configurations of other forms, such as
json patch, and runtime data with variables.
This part dives deep into the Operator Framework, covering all aspects from the Operator SDK, a toolkit to help developers, to the Operator Registry and the Operator Lifecycle Manager (OLM). The OLM takes care the lifecycle of operators including the updates to the operators and their resources. The OLM also is part of the OpenShift 4.x Container Platform. This tutorial explains how to use the Operator SDK and OLM to develop and install operators. It walks through steps of developing an operator, including how to create the operator template, construct the reconciling logic in Ansible, build the operator image, register the operator, deploy the operator, and then install the application through the operator on IBM Kubernetes and OpenShift clusters.
Before you walk through this tutorial, you need to set up the following environment:
- Set up a cloud and Kubernetes environment like the IBM Cloud Kubernetes Service.
- If you are going to install an operator through the OLM (step 7. e.), set up an OpenShift Container Platform 4.1
- Install Operator SDK, as described in Operator SDK Installation.
gitso you have access to this tutorial’s
github repo. You run following command to clone the repo:
git clone https://github.com/adrian555/ofip.git
Completing this tutorial should take approximately 30 minutes.
Basic concepts when working with operators
Operators were introduced in 2016 by CoreOS. Operators are a method of packaging, deploying, and managing a Kubernetes application. An operator has its custom controller watching the custom resources specifically defined for the applications. Therefore, an operator mainly consists of Kubernetes
CustomResourceDefinitions (CRDs) and controller logic. Operators extend the Kubernetes API with CRDs. When you create a new CRD, the Kubernetes API Server creates a new set of RESTful endpoints, for example,
/apis/os.ibm.com/v1alpha1/namespaces/default/SparkCluster, to manage the custom resource. The operator itself is also a Kubernetes application. So, to run an operator on an OpenShift cluster is to create a Kubernetes deployment: you deploy an operator pod that can run one or more containers with pre-built images.
With operators, it is easy to manage complex stateful applications. A database application is one example of stateful applications. “Day 2” activities, such as patching, updating, upgrading, and scaling, are other good reasons to manage applications with operators.
The Operator Framework offers an open source toolkit to build, test, package operators and manage lifecycle of operators. It includes the following tools:
- operator-sdk — write, test, and package operators
- operator-courier — build, verify, and push operator manifests (
- operator-registry — store the manifest data in database and provide operator catalog data to Operator Lifecycle Manager
- operator-lifecycle-manager — use installation, upgrade, and role-based access control control operators (the “operator of operators”)
- operator-metering — collect operational metrics of operators for “day 2” management
- operator-marketplace — register off-cluster operators
- community-operators — host community-created operators and publish to operatorhub.io
This tutorial covers the two main components
operator-lifecycle-manager, which are essential for creating and managing operators. The
operator-sdk provides an SDK for building operators, and the
operator-lifecycle-manager provides a service to discover, install, and manage operators.
An operator watches on the custom resources through CRDs. The OLM runs a version of operator with a
ClusterServiceVersion, which contains all resources, including CRDs and role-based access control rules to run the operator.
Create a SparkCluster operator with the Operator SDK
In this tutorial you build an operator to create a stand-alone Spark cluster running on an OpenShift 4.1 cluster. A Spark stand-alone cluster consists of one or more master and worker nodes. Each node runs the Spark binary in a daemon process. The master node provides the
SPARK_MASTER URL for Spark drivers to submit applications to run on. The worker node receives jobs from the master and spawns exectutors to run the tasks. The application managed by the operator creates such a Spark cluster.
Complete the following steps to build and install the Spark operator:
Build Spark Docker image.
Because you are running on an OpenShift Kubernetes cluster, you must first prepare the container image for the Spark master and worker nodes. This tutorial uses Spark’s
docker-image-toolto build and push the Docker image, with some tweaks to install
pysparkand make the image for both master and worker node.
# download the original Spark binary wget http://ftp.wayne.edu/apache/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz tar zxvf spark-2.4.4-bin-hadoop2.7.tgz # modify Dockerfile cd spark-2.4.4-bin-hadoop2.7 cp ../patch/run.sh sbin cp ../patch/Dockerfile kubernetes/dockerfiles/spark # build and push the docker image # NOTE: replace the repository with the one you owned ./bin/docker-image-tool.sh -r docker.io/dsml4real -t v2.4.4 build ./bin/docker-image-tool.sh -r docker.io/dsml4real -t v2.4.4 push cd ..
Create the Spark operator.
Operator SDKcan help you build one of the following types of operators:
Ansible. The difference between these three types of operators is the maturity of an operator’s encapsulated operations:
The operator maturity model is from the OpenShift Container Platform document.
This tutorial chooses
Ansibleas the mechanism to write the controller logic for the Spark operator.
You can install the
brewon macOS or download it as binary.
After the SDK is installed, run following command to create the operator:
mkdir temp operator-sdk new spark-operator --api-version=ibm.com/v1alpha1 --kind=Spark --type=ansible cd ..
This command creates the scaffolding code for the operator under the
spark-operatordirectory, including the manifests of CRDs, example custom resource, the role-based access control role and rolebinding, and the Ansible playbook role and tasks. It also creates the Dockerfile to build the image for the operator. The directory structure and contents are similar to the example included in the repo.
After an Ansible type of operator is installed on the OpenShift cluster, a pod is created and runs with two containers:
ansible-runnerprocess runs the
roleor playbook provided in the
spark-operator/roles/sparkdirectory. For details, see operator-framework/operator-sdk.
Add Ansible tasks to install a Spark cluster.
The application managed by the operator installs a Spark cluster. You can install a Spark cluster through Ansible playbook tasks. This tutorial already implements the code in the
The main tasks are in the
tasks/main.ymlfile. It creates a
spark-masterKubernetes deployment to run a Spark master pod, a
spark-workerKubernetes deployment to run one or more Spark worker pods, and a
spark-clusterKubernetes service to access the Spark driver requests to the Spark master. You can also use it to create a
PersistentVolumeClaimwhich the Spark worker pods use if the access to a distributed file system (DFS) server is provided. You can use the
worker-sizeto specify the number of pods created by the
spark-workerdeployment. This parameter is passed in through the
CustomResourcewhen you create a Spark cluster from the Spark operator.
It is worth mentioning that these tasks are the same when you install a Spark cluster without an operator. All this tutorial step does is rewrite them to use with the Ansible automation tool.
Update role-based access control roles.
The Operator SDK command creates a
spark-operatorservice account with a specific role and role binding. This service account installs and manages the application. And the role defines the role-based access control restriction for the service account. For security, carefully choose the roles to be granted for this service account. Role-based access control provides authorization of given actions to the service account. If an operator is not supposed to delete certain resources, do not give the
deleteaction. Consider another situation, where an operator has dependencies: one of them might perform an unexpected action from version to version, and you want to know the action is not allowed.
Also, if an operator is watching resources from other namespaces across the OpenShift cluster, you might need to choose appropriate cluster roles instead. This tutorial just binds the
cluster-adminrole to the
spark-operatorservice account for simplification purpose.
Create the Spark manifest.
An operator creates and watches custom resource definitions. These CRDs are saved in the
spark-operator/deploy/crdsdirectory. To create an instance of the CRD, you need to create a manifest file. You can specify parameters for the custom resource as well.
The Spark operator in this tutorial creates the
Sparkcustom resource. One example of the manifest to create an application of the
Sparkcustom resource is the ibm_v1alpha1_spark_pv_cr.yaml file.
Build the Docker image for operator and update operator deployment to use the image.
The Operator SDK command already generates a
Dockerfilefor the operator image. Run the following command to build the Docker image:
cd spark-operator # build the docker image operator-sdk build dsml4real/spark-operator:v0.0.1 # push the docker image docker push dsml4real/spark-operator:v0.0.1 cd ..
As mentioned earlier, the operator code is run through a deployment defined in the
spark-operator/deploy/operator.yamlfile. Update that file and make sure that the image for both
operatorcontainers run with the image you built earlier.
So far, you have prepared the artifacts for the application: a stand-alone Spark cluster and an Ansible type of operator named
spark-operator(with the Operator SDK tool). You added the CRDs and controller logic to deploy a Spark cluster, and you built the docker image for the operator. These are the common steps for writing an operator. Now the Spark operator is ready. You can either test the operator locally or install it on the cluster.
Install Spark operator.
There are two approaches to install an operator. One is to run some OpenShift client
occommands to create service account, role binding and then the operator itself. The other one is to package the resources into a
ClusterServiceVersionand then let Operator Lifecycle Manager (OLM) install.
To Install manually, use the following commands install the Spark operator:
cd spark-operator # create a new project (optional) oc new-project tutorial oc adm policy add-scc-to-user anyuid -z default # create CRDs oc apply -f deploy/crds/ibm_v1alpha1_spark_crd.yaml # create service account oc apply -f deploy/service_account.yaml # create role and role binding oc apply -f deploy/role.yaml oc apply -f deploy/role_binding.yaml # create the operator deployment oc apply -f deploy/operator.yaml cd ..
A pod like the following example is created:
oc get pods -n tutorial ### NAME READY STATUS RESTARTS AGE ### spark-operator-7477ff4c94-lgb6z 2/2 Running 0 40s
To check the progress, run following commandto view the logs from the
operatorcontainer of the
kubectl logs deployment/spark-operator operator -n tutorial -f
To install through the OLM, two steps are required. First, generate a
ClusterServiceVersionmanifest to include the metadata, CRDs, and installation strategy for the operator. Secondly it must register itself to a catalog for the OLM to discover.
As illustrated in the figure, you must complete several steps:
The Operator SDK tool also helps generate the
ClusterServiceVersion. Run following command:
cd spark-operator operator-sdk olm-catalog gen-csv --csv-version 0.0.1 --update-crds cd ..
ClusterServiceVersionmanifest is generated in the
spark-operator/deploy/olm-catalogdirectory. The directory contains different versions of
ClusterServiceVersions and dependent CRDs, together with a package manifest describing the channels for different installation path for the application (such as alpha or stable). Use it in the Subscription manifest when creating or upgrading the CustomResource (for example, the application).
You can update the
ClusterServiceVersionfile with metadata such as adding an icon to the operator. This tutorial comes with the updated version of
b) Verify the
Now you verify the
ClusterServiceVersionwith the operator-courier tool, part of the
Operator Framework. You can install
pippackage by running the following command:
pip3 install operator-courier
Run the following command to validate the
cd spark-operator/deploy/olm operator-courier verify spark-operator cd -
c) Build the Docker image for the
operator-registryis also part of the
Operator Framework. It provides operator catalog data to the OLM by running a registry server on a certain port for OLM to discover the operator.
To build the docker image for operator registry, run the following commands:
cd spark-operator/deploy/olm # copy operator manifests to a directory for Dockerfile mkdir operators cp -r spark-operator operators # build the image docker build . -t dsml4real/spark-operator-registry:v0.0.1 # push the image docker push dsml4real/spark-operator-registry:v0.0.1 cd -
d) Create a
CatalogSourceis a Kubernetes resource containing a catalog of operators that can be installed through OLM. Catalog source runs the operator registry server as a service exposed on port
50051. It runs with the docker image built above and initializes a sqlite database local to the container for data querying. This allows
OLMto discover and query the operators registered in the catalog.
A catalog source manifest looks like the following example
apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: spark-operator-catalog namespace: openshift-operator-lifecycle-manager spec: sourceType: grpc image: dsml4real/spark-operator-registry:v0.0.1 imagePullPolicy: Always displayName: Tutorial Operators publisher: IBM
namespacefield specifies the namespace where the catalog source runs. To deploy, run the following commands:
cd spark-operator/deploy/olm oc apply -f catalogsource.yaml cd -
To check the progress, run the following commands:
oc get all -n openshift-operator-lifecycle-manager|grep spark-operator-catalog
After the catalog service is up and running, run following command to verify that the
spark-operatoroperator is shown in the packages for the OLM:
oc get packagemanifest -n openshift-operator-lifecycle-manager
You are creating the catalog source in the
openshift-operator-lifecycle-managerso that you can use the built-in
OperatorGroupin this namespace to install the Spark operator. You can also create the catalog source in other namespace such as
openshift-marketplace. But you must create an
OperatorGroupthere. Future tutorials will cover
e) Install the Spark operator through OpenShift console.
Now you can see the Spark operator shows in the OpenShift console like follow
To install the operator, switch to the
openshift-operatorsnamespace and click on the
Create Subscription. Follow the instructions to create the Spark operator.
To summarize, there are two approaches to install an operator. Installing through OLM requests more preparation steps but users can benefit from long run as the operator is managed and monitored by the OLM.
Create a Spark cluster.
Finally, you create a Spark cluster using the operator you installed in the previous step. There are two approaches. Because the Spark operator creates a custom resource
Sparkin the cluster, creating a Spark cluster basically means creating an instance of
To Install manually, you just need to create a manifest to use the
Sparkcustom resource. A sample manifest is provided by the Operator SDK when the operator is created. Because you added some extra parameters to be used by the Ansible tasks, you need to add those parameters in the manifest.
You the Spark cluster with this manifest:
cd spark-operator oc apply -f deploy/crds/ibm_v1alpha1_spark_pv_cr.yaml cd ..
To check the progress and the output of the Ansible tasks, run the following command:
kubectl logs deployment/spark-operator operator -n openshift-operators -f
Also run the the following commands to make sure the pods and service for the Spark cluster are running:
oc get pods |grep spark ### spark-master-7bc49bc8f-rfjd8 1/1 Running 0 2m19s ### spark-worker-86466967fd-dq42s 1/1 Running 0 2m17s oc get svc |grep spark ### spark-cluster NodePort 172.30.56.52 <none> 7077:31687/TCP,8080:32142/TCP 2m27s
To Install with the console, because the Spark operator was installed through OLM, you can create the application from the operator with a few clicks.
From the OpenShift console, switch to the namespace where you want to create the Spark cluster. Then click the
spark-operatoroperator, and you see following page:
Create Newand follow the instruction to create a Spark cluster.
example-sparkis the instance name of the
Sparkcustom resource created by the Spark operator. You can also verify that there are
spark-workerpods and a
spark-clusterservice running in the cluster.
This tutorial provided a step-by-step guide to using the Operator Framework open source project to create, install, and deploy an operator and the application managed by the operator. You can see that the Operator SDK and OLM are great toolsets to assist you in developing and managing operators in Kubernetes.
If you are interested in testing the use of the Spark cluster you created above, see the example jupyter notebook on GitHub. Before running it, replace the
openshift-cluster-hostname with the node’s hostname where the
spark-cluster service is running.