The open source ecosystem of watsonx

What are your thoughts on using generative AI for open source model development? Join the discussion on IBM TechXchange.

IBM's AI platform, watsonx, leverages several key AI open source tools and technologies and combines them with IBM research innovations to enable enterprise-ready AI workflows that are built with responsibility, transparency, and explainability. At IBM, we believe that open source is a bedrock of modern computing, and this is even more true for the future of AI.

Learn how open source impacts watsonx through some of the key open source projects that IBM has invested in. We work to ensure a healthy future for these projects through contributing code, knowledge, and resources. After all, the developers, architects, and sysadmins that have chosen an open-source-based product like watsonx count on IBM to represent their needs in these key strategic open source communities.

Open source technologies in watsonx

IBM watsonx is our high-performing cloud-native AI open source stack that runs on Red Hat OpenShift.

Architecture diagram showing open source technologies in watsonx

For training and validation, these open source technologies are used:

CodeFlare/CodeFlare SDK
Ray/KubeRay
PyTorch/PyTorch Distributed
Kubeflow Training Operator
Job Scheduler (Kueue/MCAD)

For tuning and inference, these open source technologies are used:

KServe
fms-hs-tuning
vLLM
TGIS
PyTorch
Hugging Face libraries
Kubernetes DRA/InstaSlice

For supporting the AI development lifecycle, these open source technologies are used:

KubeFlow/KubeFlow pipelines
Open Data Hub
InstructLab
Granite models

CodeFlare and CodeFlare SDK: Simplified model development

Project CodeFlare provides a simple, user-friendly abstraction for developing, resource-scaling, queuing, and management of distributed AI/ML and Python workloads on the Red Hat OpenShift Container Platform. CodeFlare automates and simplifies the execution and scaling of the steps in the life cycle of model development, from data pre-processing, distributed model training, model adaptation, and model validation.

The CodeFlare SDK is an intuitive, easy-to-use python interface for batch resource requesting, access, job submission, and observation. It simplifies the developer's life while enabling access to high-performance compute resources, either in the cloud or on-prem.

CodeFlare integrates with the Ray and Pytorch frameworks, and enables data scientists to spend more time on model development and minimum time on infrastructure deployment and scaling. It is integrated into Open Data Hub (ODH) as a tier-one component, distributed-workload.

Learn more about the CodeFlare stack.

Ray/KubeRay: Integrating KubeRay with CodeFlare

Ray is an open source unified framework for scaling AI and running distributed Python workloads. With Ray, we enable scalable data preprocessing (such as filtering data using hate, abuse, and profanity filters) and post processing steps (like model fine-tuning and validation) with a data scientist-friendly Python API.

Multi-Cluster-App-Dispatcher (MCAD) provides support for queueing, resource quotas, and management of batch jobs on large clusters. By running KubeRay with MCAD, we guarantee that sufficient resources are available in the cluster prior to actual Ray resource creation in the Kubernetes cluster. KubeRay is an open source Kubernetes operator for deployment and management of Ray applications on Kubernetes. IBM has made many contributions in Ray workflow, data sets, python libraries, and KubeRay repos.

Learn how to run a distributed model training workload using Ray and CodeFlare in this tutorial.

PyTorch: Model training

PyTorch is a fully featured open source AI framework for building deep learning models.

PyTorch runs on Red Hat OpenShift, and we've demonstrated efficient scaling of distributed training jobs for models with over 10 billion parameters over Ethernet-based environments. There are two APIs that enable training, Distributed Data Parallel (DDP) is for small model training and Fully Sharded Data Parallel (FSDP) is for large models (>3B).

IBM is contributing to many areas of PyTorch including enhancements for distributed training for Fully Sharded Data Parallel (FSDP) APIs through the introduction of rate_limiter. Overall, IBM has made contributions to PyTorch in various areas like PyTorch core, TorchBench, torch.distributed, docs, and tutorials.

Learn more about AI model training with PyTorch in this introductory article. For a deeper dive and to play with an example of how to build, train, and evaluate an ML model, refer to this companion guided project.

Kubeflow Training Operator: Fine-tuning and scalable distributed training

Kubeflow Training Operator is a Kubernetes-native project for fine-tuning and scalable distributed training of machine learning (ML) models created with various ML frameworks such as PyTorch, TensorFlow, XGBoost, and others.

The user can integrate other ML libraries such as Hugging Face, DeepSpeed, or Megatron-LM with Training Operator to orchestrate their ML training on Kubernetes.

Training Operator allows you to use Kubernetes workloads to effectively train your large models via Kubernetes Custom Resources APIs or using Training Operator Python SDK.

Training Operator provides custom resources that makes it easy to run distributed or non-distributed TensorFlow, PyTorch, Apache MXNet, XGBoost, or MPI jobs on RedHat OpenShift.

Job Scheduler (Kueue/MCAD): Distribute workloads for model training

We prioritize two main job scheduler projects. This supports job queueing based on priorities with different strategies.

The multi-cluster-app-dispatcher (MCAD) is a Kubernetes controller providing mechanisms for applications to manage batch jobs in a single or multi-cluster environment. It provides an abstraction called an AppWrapper which wraps all resources of a job or application, treating them holistically. Specifically, MCAD allows you to queue each of your Ray or Kubeflow Training Operator workloads until resource availability requirements are met. With MCAD, your workload pods will only be created once there is a guarantee that all of the pods can be scheduled.

Kueue is a kubernetes-native system that manages quotas and how jobs consume them. Kueue decides when a job should wait, when a job should be admitted to start (as in pods can be created) and when a job should be preempted (as in active pods should be deleted). We integrated AppWrapper abstraction to smoothly interoperate with Kueue, providing a flexible and workload-agnostic mechanism for enabling Kueue to manage a group of Kubernetes resources as a single logical unit without requiring any Kueue-specific support by the controllers of those resources. AppWrappers are also designed to harden workloads by providing an additional level of automatic fault detection and recovery.

KServe/ModelMesh: Highly scalable model inferencing

KServe is a highly scalable and standards-based model inference platform on Kubernetes. It’s an open source, standard cloud-agnostic model inference platform that provides pluggable production serving. KServe provides a Kubernetes Custom Resource Definition for serving predictive and generative machine learning (ML) models. It aims to solve production model serving use cases by providing high abstraction interfaces for Tensorflow, XGBoost, ScikitLearn, PyTorch, and Hugging Face Transformer/LLM models using standardized data plane protocols.

Developed collaboratively by Google, IBM, Bloomberg, NVIDIA, and Seldon in 2019, KServe has served IBM Watson products like NLU, Assistant, and Discovery in production for several years, and it is used by Red Hat OpenShift Data Science (RHODS) and Open Data Hub (ODH). It underpins the enterprise-ready AI and data platform watsonx.

fms-hf-tuning: Collection of tuning recipes

fms-hf-tuning is a collection of tuning recipes with Hugging Face SFTTrainer and PyTorch FSDP. The models are loaded from Hugging Face transformers or the foundation-model-stack, where models are either optimized to use Flash Attention v2 directly or through SDPA.

vLLM: Library for LLM inference and serving

vLLM is a fast and easy-to-use library for LLM inference and serving. It offers interfaces for offline batched inference and online serving with an OpenAI-compatible server. With 21k GitHub stars, vLLM has become a top choice for open-source LLM inference.

TGIS: Text generation inference server

TGIS is the IBM fork of Text Generation Inference (TGI), which is a Hugging Face toolkit for deploying and serving Large Language Models (LLMs). TGI enables high-performance text generation for the most popular open-source LLMs, including Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and T5.

TGIS is a serving backend (server) that loads models and provides an inference engine.

Hugging Face libraries

Hugging Face is affectionately called the GitHub of AI, and it has over 200k model repos, over 40k data sets, and those numbers are growing by the day. With Hugging Face, you can build, train, and deploy state of the art models powered by a reference library of open source machine learning models and data sets.

Under the hood, watsonx.ai integrates many Hugging Face open source libraries, such as transformers (which has 100k+ GitHub stars!), accelerate, peft, and Hugging Face’s Text Generation inference server, just to name a few.

Kubernetes DRA/InstaSlice

Both Kubernetes DRA and CodeFlare InstaSlice are strategic to watsonx.

DRA or Dynamic resource allocation is an API for requesting and sharing resources between pods and containers inside a pod. It is a generalization of the persistent volumes API for generic resources. Third-party resource drivers are responsible for tracking and allocating resources, with additional support provided by Kubernetes via structured parameters (introduced in Kubernetes 1.30).

CodeFlare InstaSlice facilitates the use of Dynamic Resource Allocation (DRA) on Kubernetes clusters for GPU sharing.

Kubeflow/Kubeflow Pipelines: ML workflow orchestration

Kubeflow is an open-source platform for machine learning and MLOps on Kubernetes. Kubeflow Pipelines is an end-to-end platform for ML workflow orchestration. Kubeflow Pipelines supports Tekton, the Red Hat certified CI/CD pipelines on OpenShift, for running production ML pipelines on OpenShift.

Kubeflow Pipelines provides support for scheduling multi-step ML workflows and it provides end-to-end orchestration of ML pipelines, including data processing, model training, prompt tuning, and model serving and monitoring. KFP is also used by Red Hat OpenShift Data Science (RHODS) and Open Data Hub (ODH).

Check out this learning path, "Getting started with Kubeflow Pipelines" when you're ready to start orchestrating your ML workflows with Kubeflow Pipelines.

IBM Watson Pipelines uses Kubeflow Pipelines as its foundation, and it is integrated into watsonx.ai as a component of IBM Watson Studio.

Open Data Hub: Facilitate the AI development lifecycle

Open Data Hub (ODH) is a comprehensive collection of open-source tools designed to leverage the strengths of Red Hat OpenShift to facilitate the entire AI development lifecycle. Many of the technologies introduced in Open Data Hub mature to become part of Red Hat OpenShift AI and serve as the infrastructure foundation for watsonx.ai.

The Open Data Hub project provides open source tools for distributed AI and machine learning (ML) workflows, a Jupyter Notebook development environment, and monitoring. The Codefare SDK is integrated into the out-of-the-box ODH notebook images and provides an interactive client for data scientists to define resource requirements (GPU, CPU, and memory) and to submit and manage training jobs.

ODH adopted an organization and governance model to foster transparent decision making and innovation. You can learn more about the ODH Governance model in the ODH GitHub.

Learn how you can use the CodeFlare SDK in the Open Data Hub dashboard to define, develop, and control remote distributed comput jobs.

Introducing: InstructLab!

InstructLab is a model-agnostic open source AI project that facilitates contributions to Large Language Models (LLMs). It enables anyone to shape generative AI through contributing updates to existing LLMs in an accessible way. InstructLab's model-agnostic technology gives upstream models the ability to create regular builds of their open source licensed models not by rebuilding and retraining the entire model but by composing new skills into it.

Read the "LAB: Large-scale Alignment for chatBots" research paper for more details about InstructLab.

InstructLab provides a platform for easy engagement with Large Language Models (LLMs) by using the ilab command-line interface (CLI) tool. Users can add to the LLM's capabilities by submitting the skills and knowledge to the project's taxonomy repository on GitHub by simply creating a pull request.

Find out more in InstructLab articles and tutorials on IBM Developer.

Granite models

Granite is IBM’s flagship brand of open source large language models (LLMs) spanning multiple modalities.

We make AI as accessible as possible for as many developers as possible. That’s why we have open-sourced core Granite Code, Time Series, Language, and GeoSpatial models and made them available on Hugging Face under permissive Apache 2.0 license that enables broad, unencumbered commercial usage.

All Granite models are trained on carefully curated data, with industry-leading levels of transparency about the data that went into them. We have also open-sourced the tools we use to ensure the data is high quality and up to the standards that enterprise-grade applications demand.

Learn more about watsonx

Now that you've learned about the all the open source tools and technologies that underpin watsonx, learn more about the watsonx product suite, the watsonx virtual assistant offerings, and about watsonx.ai in particular.

Watsonx.ai brings together new generative AI capabilities, powered by foundation models, and traditional machine learning into a powerful platform spanning the AI lifecycle. With watsonx.ai, you can train, validate, tune, and deploy models with ease and build AI applications in a fraction of the time with a fraction of the data. Try the watsonx.ai next-generation studio for AI builders.

Try watsonx.ai -- next-generation studio for AI builders. Explore more articles and tutorials about watsonx on IBM Developer.

You can also join the discussion about this topic on IBM TechXchange.