Article

Overview of CodeFlare

Automate and simplify model development

The key to modern AI is the emergence of transformers and large pre-trained models, sometimes called "foundation models." We can use foundation models to quickly perform tasks for a variety of AI applications with minimal effort. But business and data scientists need an environment to further train and validate these models.

Project CodeFlare provides a simple, user-friendly abstraction for developing, resource-scaling, queuing, and managing of distributed ML and Python workloads on Red Hat OpenShift Container Platform. With CodeFlare, users automate and simplify the execution and scaling of the steps in the life cycle of model development, from data pre-processing, distributed model training, model adaptation, and model validation.

Through transparent integration with the Ray and PyTorch frameworks, and the rich library ecosystem that run on them, CodeFlare enables data scientists to spend more time on model development and less time on infrastructure deployment and scaling.

CodeFlare stack

CodeFlare installs on top of the Red Hat OpenShift Container Platform, which provides a secure Kubernetes environment with self-healing, scaling, resource management, and an operator framework. Red Hat OpenShift delivers a consistent experience across public cloud, hybrid cloud, or on-premise.

Open Data Hub (ODH) is installed into OpenShift and provides open source AI tools for running large and distributed AI workloads. CodeFlare leverages the ODH dashboard to serve the CodeFlare Jupyter notebook for easy integration with the CodeFlare-SDK. The CodeFlare Operator automates the deployment and configuration of the CodeFlare Stack.

CodeFlare Stack

The CodeFlare Stack includes:

  • Multi-Cluster Application Dispatcher (MCAD) is an open source project that came out of IBM Research. It provides an abstraction called an appwrapper which wraps all resources of a job or application, treating them holistically. MCAD constantly monitors the distributed cluster(s) and provides the ability to provide queuing policies like First In First Out, priority, and quota management.
  • InstaScale is also an open source project out of IBM Research. It's an optional component that works with MCAD and provides on-demand resource scaling on an OpenShift cluster. If enabled, it can scale up additional Kubernetes worker nodes to complete the requested job, and then autoscales the added worker nodes down after the job completes, thereby reducing the need to have expensive GPU worker nodes idle between jobs.
  • KubeRay is an open source Kubernetes operator that simplifies the deployment and management of Ray applications on Kubernetes. KubeRay fully manages the lifecycle of RayCluster, including creating or deleting clusters, autoscaling, and ensuring fault tolerance.

The heart of CodeFlare is the CodeFlare SDK and CodeFlare CLI. The SDK and CLI is used to define, develop, and control remote distributed compute jobs and infrastructure from either a python-based environment or command-line interface. A custom Jupyter notebook is provided to allow data scientists to easily and quickly interface with the CodeFlare components using the CodeFlare-SDK.

For more information on the CodeFlare stack and to see multiple demos, including InstaScale in action, see this CodeFlare video.

Summary and next steps

CodeFlare helps you automate and simplify the execution and scaling across the model development lifecycle, from data pre-processing, distributed model training, model adaptation, and model validation. If you want an enterprise-grade platform for managing the model development lifecycle, be sure to check out watsonx.ai.

Watsonx.ai brings together new generative AI capabilities, powered by foundation models, and traditional machine learning into a powerful platform spanning the AI lifecycle. With watsonx.ai, you can train, validate, tune, and deploy models with ease and build AI applications in a fraction of the time with a fraction of the data.

Try the watsonx.ai, the next-generation studio for AI builders. Explore more articles and tutorials about watsonx on IBM Developer.