Win $20,000. Help build the future of education. Answer the call. Learn more

IBM Developer Blog

Follow the latest happenings with IBM Developer and stay in the know.

Kubeflow Pipelines on Tekton reaches 1.0 and Watson Studio Pipelines is now available in open beta


Our last blog post announcing Kubeflow Pipelines on Tekton discussed how Kubeflow Pipelines became a primary vehicle to address the needs of both DevOps engineers and data scientists. As a reminder, Kubeflow Pipelines on Tekton is a project in the MLOps ecosystem, and offers the following benefits:

  • For DevOps folks, Kubeflow Pipelines taps into the Kubernetes ecosystem, leveraging its scalability and containerization principles.
  • For Data scientists and MLOps practitioners, Kubeflow Pipelines offers a Python interface to define and deploy Pipelines, enabling metadata collection and lineage tracking.
  • For DataOps folks, Kubeflow Pipelines brings in ETL bindings to participate more fully in collaboration with peers by providing support for multiple ETL components and use cases.

The pipelines team has been busy the last few months creating enhancements for Kubeflow Pipelines on Tekton to handle more MLOps and DataOps needs, and creating a stable, production-ready deliverable. As part of this, we are excited to announce that the project has reached 1.0 milestone. Additionally, IBM’s offering built on top of this open source project, Watson Studio Pipelines, is now available in open beta!

Kubeflow Pipelines on Tekton 1.0 release

We are excited to announce the 1.0 release for Kubeflow Pipelines on Tekton (KFP-Tekton) project. Many features such as graph recursion, conditional loops, caching, any sequencer, dynamic parameters support, and the like were added to the project in the process of reaching this milestone. These new features were not supported in the Tekton project natively, but they are crucial for running real-world machine learning workflows using Kubeflow Pipelines.

This blog highlights some of these new functionalities we released in this version, specifically that handle data flows.

These enhancements include:

Pipeline loops

The current Tekton design doesn’t allow any loop or sub-pipeline inside the pipeline definition. Recently, Tekton introduced the concept of Tekton custom tasks to allow users to define their own workload definition by building their own controller reconcile methods. This opened the door for us to support Kubeflow Pipeline loops and recursions that weren’t possible before on Tekton. We are bringing back these enhancements to the Tekton community.

The ParallelFor loop in Kubeflow Pipeline is a loop that runs tasks on a set of parameters in parallel. For Tekton, the kfp-tekton team built a Tekton custom task controller that reconciles multiple Tekton sub-pipelines in parallel over a set of parameters (both static and dynamic), and supports parallelism to control the number of parallel running sub-pipelines.

This is a huge step forward for what we can achieve on Tekton, and it allows Tekton to handle pipelines that are much more complex.

The diagram below describes the flows for three different types of parallel loops.

  • Typical loops are loops that traverse a list of tasks over one argument.
  • Multi-args loops are similar to typical loops but with multiple arguments.
  • Condition loops are loops that can break or continue based on a certain condition.

pipeline loop image

Recursion

Recursion enables the same code block to execute and exit based on dynamic conditions. Current Tekton features don’t allow for recursion.

However, with the new Tekton custom task controller that the KFP-Tekton built for loops and sub-pipelines, we can now run sub-pipelines with conditions that can refer back to itself to create recursions, and it can be extended to cover nested parallel loops inside recursions. This demonstrates how the KFP-Tekton team is leading some of the cutting edge features for Tekton and bringing back to the Tekton community.

The following diagram shows that the recursive function is defined as a sub-pipeline and can refer back to itself to create recursions.

Recursion flow

Pluggable Tekton custom task

The KFP-Tekton team also worked on a new way to enable users to plug their own Tekton custom task into a Kubeflow Pipeline. For example, a user might want to calculate an expression without creating a new worker pod. In this case, the user can plug in the Common Expression Language (CEL) custom task from Tekton to calculate the expression inside a shared controller without creating a new worker pod.

The pluggable Tekton custom task in Kubeflow Pipeline gives more flexibility to users that want to optimize their pipelines further and compose tasks that are currently not possible with the default Tekton task API. The KFP-Tekton team also contributes to Tekton to make the custom task API more feature complete such as supporting timeout, retry, and inlined custom task spec.

The image below shows how the regular tasks A and D are running inside a new dedicated pod, whereas the custom tasks B and C are running inside a shared controller to save pod provision time and cluster resources.

image showing Tekton tasks completion

AnySequencer

AnySequencer is a dependent task that starts when any one of the task or condition dependencies complete successfully. The benefit of AnySequencer over the logical OR condition is that with AnySequencer, the order of execution of the dependencies doesn’t matter. The pipeline doesn’t wait for all the task dependencies to complete before moving to the next step. You can apply conditions to enforce the task dependencies completes as expected.

The following image shows how the AnySequencer task can start a new task while an original task is waiting for a dependency.

AnySequencer image

Caching

Kubeflow Pipelines caching provides task-level output caching. Unlike Argo, by design, Tekton doesn’t generate the task template in the annotations to perform caching. To support caching on Tekton, we enhanced the KubeFlow Pipeline cache server to auto-generate the task template for Tekton as the hash code which caches all the identical workloads with the same inputs.

By default, compiling a pipeline adds metadata annotations and labels so that results from tasks within a pipeline run can be reused if that task is reused in a new pipeline run. This saves the pipeline run from re-executing the task when the results are already known.

The following diagram shows the caching mechanism for Kubeflow Pipeline on Tekton (KFP-Tekton). All task executions and results are stored as hash code in the database to determine cached tasks.

caching flow

Watson Studio Pipelines now available in Open Beta!

We are excited to announce that Watson Studio Pipelines is now available in Open Beta! This new Watson Studio offering allows users to create repeatable and scheduled flows that automate notebook, data refinery, and machine learning pipelines: from data ingestion to model training, testing, and deployment. With an intuitive user interface, Watson Studio Pipelines exposes all of the state-of-the-art data science tools available in Watson Studio and allows users to combine them into automation flows, creating continuous integration / continuous development pipelines for AI.

Watson Studio Pipelines is built off of Kubeflow Pipelines on the Tekton runtime and is fully integrated into the Watson Studio platform, allowing users to combine tools including:

  • Notebooks
  • Data refinery flows
  • AutoAI experiments
  • Web service / online deployments
  • Batch deployments
  • Import and export of project and space assets

The new features, driven by DataOps scenario and leveraging the new Tekton extensions, are coming soon:

The following example showcases how to import datasets into Watson Studio using DataStage flow, create and run AutoAI Experiments with hyperparameter optimization, and serve the best tuned model as a web service. It sends notification in case of a failure and finally executes a custom user script.

alt

To experience this AI lifecycle automation for yourself, please go the Watson Studio Pipelines beta page

Join us to build cloud-native Data and AI Pipelines with Kubeflow Pipelines and Tekton

Please join us on the Kubeflow Pipelines with Tekton GitHub repository, try it out, give feedback, and raise issues. Additionally you can connect with us via the following:

  • To contribute and build an enterprise-grade, end-to-end machine learning platform on OpenShift and Kubernetes, please join the Kubeflow community and reach out with any questions, comments, and feedback!
  • To get access to Watson AI Pipelines, sign up for beta access list.
  • If you want help deploying and managing Kubeflow on your on-premises Kubernetes platform, OpenShift, or on IBM Cloud, please connect with us.
  • To run Notebook-based pipelines using a drag-and-drop canvas, please check out the Elyra project in the community, which provides AI-centric extensions to JupyterLab.
  • Check out the OpenDataHub if you are interested in open source projects in the Data and AI portfolio, namely Kubeflow, Kafka, Hive, Hue, and Spark, and how to bring them together in a cloud-native way.

Summary

This blog post introduced you to some of the new enhancements that we’ve been working on to make Kubeflow Pipelines on Tekton more extensible for users. Our hope is that you’ll find the new functionality to help you solve your DataOps needs.

Thanks to our contributors

Thanks to many contributors of Kubeflow Pipelines with Tekton for contributing to the various aspects of the project, both internally and externally. A few I want to specifically call out include:

  • Adam Massachi
  • Christian Kadner
  • Jun Feng Liu
  • Yi-Hong Wang
  • Prashant Sharma
  • Feng Li
  • Andrew Butler
  • Jin Chi He
  • Michalina Kotwica
  • Andrea Fritolli
  • Priti Desai
  • Gang Pu
  • Peng Li
  • Błażej Rutkowski

Additionally, thanks to to OpenShift Pipelines and Tekton teams from Red Hat, and the Elyra team for feedback. Last but not the least, thanks to the Kubeflow Pipelines team from Google for helping and providing support.