IBM Developer Blog

Follow the latest happenings with IBM Developer and stay in the know.

Get a deeper understanding of the Tekton Pipeline architecture and experiment with adding Python support to Tekton Pipelines.


My colleague Priti Desai has been working on Tekton for more than a year and has made some great contributions. After seeing how much fun she was having, I decided to take a leap in the same direction. Priti already built a Tekton pipeline for Java and JavaScript applications, so I figured adding Python support to her pipeline was a great way to become familiar with Tekton.

What is Tekton

Tekton is a continuous integration and continuous delivery (CI/CD) pipeline that can operate natively inside your Kubernetes cluster. Tekton’s website offers a complete description of the open source project and there’s even this informative What is Tekton? lightboard video, but I’d still like to give my own brief description of the Tekton Pipeline architecture.

Kindly note that there are several common words such as “task” and “step”, which may become overloaded. To counter this, I will use capitalization to distinguish between Tekton resources and the common usage of those words. For example, in this sentence, Task is referring to Tekton Tasks. In this sentence, the task of writing this blog, task has its usual meaning.

Tasks and Steps

Tekton Pipelines are composed primarily of Tasks, Pipelines, and PipelineRuns that are written in YAML. A Task is a logical unit of work. They are analogous to functions – they can even take parameters! Each Task is composed of one or more Steps which are the actual containers that do the work. A Step can be a script, the running of a particular command, or any other appropriate container-based operation. Each Task has its own Kubernetes Pod in which the individual containers for each Step are run. In this programming analogy, a Step would correspond to individual statements that are composed to make up the function.

Pipelines

Pipelines are the way in which the individual Tasks are orchestrated to perform the end goal. This is roughly analogous to the main function in a program. This is where any other resources necessary to make the Pipeline function are declared. Keep in mind that this is just the general Task orchestration and not the actual instantiation of the Pipeline.

An additional nicety is using Tekton Workspaces. Workspaces allow for a single persistent volume to be shared across multiple Tasks that can be referenced by each Task in a unified way. This makes it easier to utilize the result of one Task within another, or even between Steps in the same Task.

PipelineRuns

Finally, PipelineRuns are the instantiation of a general Pipeline to perform a specific goal. Continuing the analogy, this is how you would actually run the Pipeline with specific inputs from the user. This is basically the difference between the source code or executable for the sed command and actually running sed 's/foo/bar/g' to get results.

Great! You now have a high-level understanding of the Tekton Pipeline I’m about to describe.

Python to Knative Tekton Pipeline

As stated earlier, I am building on the work documented by Priti. To learn more about the work I’m referencing, see Priti’s blog post discussing the Java and JavaScript Pipeline. She mentions that the Pipeline could be extended to work with other OpenWhisk runtimes and that is just what I have done! I have extended the pipeline to also work with the OpenWhisk Python runtime. In theory, this means you could seamlessly make your OpenWhisk Python Actions run on Knative.

In order to achieve this, you follow the same basic outline as described in Priti’s blog.

Architecture Diagram

Note that labels 1 and 4 are done by Tekton itself, based on resources you specify in the Pipeline and PipelineRun, and are not explicit Tasks in your Pipeline.

Task 1

The first task in the Python Pipeline is to install the necessary app dependencies as described by the requirements.txt file. To do this, you must first install a virtual environment in the app folder. This is required to ensure any third-party libraries are retrieved and packaged with the application. There are four basic steps to process:

  1. Retrieve the app source.
  2. Install your virtual environment.
  3. Take note of the default packages installed in the virtual environment.
  4. Install the necessary requirements with pip.

The reason step 3 is required is because you do not want to package unnecessary standard libraries with your application since they will already be installed on the system. This is to reduce the size of the archive you create in the next step, and to reduce the startup time at invocation.

If you’d like to learn more about the package Python for OpenWhisk, I’d recommend reading Python Packages in OpenWhisk by James Thomas.

Task 2

The app source code and necessary third-party libraries are then packaged into a zip file that can be used in the OpenWhisk runtime.

  1. Create a list of packages in the virtual environment that you do not want to include in the zip file (these are the packages from step 3 above).
  2. Create a list of all packages that are present in the virtual environment and filter out the ones you want to exclude.
  3. Zip those packages, the app source code, and virtualenv/bin/activate_this.py. This is done to inform Python about where to locate your third-party libraries.

Task 3

This is where you modify the Dockerfile for the OpenWhisk runtime to include the necessary environment variables and build that image before uploading it to a Docker repository.

  1. Base64 the zip file and insert it into the __OW_ACTION_CODE environment variable. This is where OpenWhisk will unzip the action code so that it may be executed later. This step in particular is a little tricky because if you make the line too long, the Dockerfile will become invalid. This line length can be quickly exceeded by even the smallest of pip packages. To overcome this, use the fold command to force each line to 80 characters, and use sed to insert an escape to each line of the encoding. This prevents larger zip files from overwhelming the buffer.

  2. You use Kaniko Executor to build this newly created Dockerfile that is then uploaded to the Docker repository of your choosing.

I’d like to share that I spent a fair bit of time piecing together step 1 in task 3. I actually did not know the fold command existed until I found I needed it. fold is an example of a good Unix tool; it doesn’t try to do more than folding long lines into a specific number of characters or bytes with a newline. Combined with a sed script to escape your newlines and remove the final trailing newline, you are no longer bound like mere mortals to a specific buffer length. This is a good trick to have in your back pocket for when you are filling large environment variables (or any other large generated lines) in a Dockerfile and exceed the line limits. I had not seen this documented elsewhere, so I wanted to make a quick note of it in case you ever run into the same problem.

Example of a run

Using the example from James Thomas’ blog, I created a repository that has a simple joke program. Then I created a PipelineRun that specifies that repo as the app source:

apiVersion: tekton.dev/v1alpha1
kind: PipelineRun
metadata:
  name: build-app-image
spec:
  serviceAccountName: openwhisk-app-builder
  pipelineRef:
    name: build-openwhisk-app
  workspaces:
    - name: openwhisk-workspace
      persistentVolumeClaim:
        claimName: openwhisk-workspace
  params:
    - name: OW_APP_PATH
      value: ""
    - name: DOCKERFILE
      value: "core/python3Action/Dockerfile"
    - name: OW_ACTION_NAME
      value: "openwhisk-padding-app"
  resources:
    - name: app-git
      resourceSpec:
        type: git
        params:
          - name: url
            value: https://github.com/pwplusnick/jokes-test.git
    - name: runtime-git
      resourceSpec:
        type: git
        params:
          - name: url
            value: https://github.com/apache/openwhisk-runtime-python.git
    - name: app-image
      resourceSpec:
        type: image
        params:
          - name: url
            value: docker.io/pwplusni/openwhisk-jokes

I applied that PipelineRun to my Kubernetes cluster with Tekton installed, and the necessary assets from the openwhisk-build repo were already applied. After a few minutes, I saw a successful build and a new image in my Docker image repository. I then applied the simple Knative service:

apiVersion: serving.knative.dev/v1alpha1
kind: Service
metadata:
  name: openwhisk-python-app
  namespace: default
spec:
  runLatest:
    configuration:
      revisionTemplate:
        spec:
          container:
            image: docker.io/pwplusni/openwhisk-padding-app:latest

Now I can curl that service for hilarious jokes, such as {"joke": "I had a problem so I thought I'd use Java. Now I have a ProblemFactory."}, to my heart’s content. I hope that made you laugh!

Summary

I hope this blog has been informative and sparks your curiosity in learning more about Tekton and its capabilities. Tekton comes with a very welcoming community that I encourage you to explore! If you’d like to give this example a try and need a Kubernetes cluster, please grab a free cluster from IBM Cloud Kubernetes Service – no credit card needed! And finally, as always, stay safe, have fun, and happy hacking!

Will Plusnick