We are in the midst of a sea change in the data and AI landscape these days. Throughout the entire stack, from hardware to distributed data processing, all the way up to advanced machine learning and deep learning, these changes will have a profound impact on our society. According to a study by McKinsey, in their survey of 3,000 AI-aware C-level executives, only 20% said they currently use any AI-related technology at scale or in a core part of their businesses. There is definitely a growing gap between early AI adopters and the rest of the community.

With the advent of open source deep learning engines like TensorFlow, PyTorch, Keras, etc., there’s a rapidly growing need for skills and technologies that provide a consistent and standardized way to interact with these different machine learning engines. We need to drive standardized approaches in the industry (for example, ONNX) to ensure we are all marching towards a common goal with a common set of standards and technologies. In addition, we want these technologies to be democratized so that they’re easily accessible to and consumable by developers, both in open source and enterprises.

At IBM, we’ve been a key community member and driver of this revolution. We believe these advances need to happen out in the open, driven by open standards, open code, open communities, and open governance. I’ve talked previously about the rich history of IBM and its contributions to open standards, as well as the work IBM is doing with IBM Code around democratizing these technologies using code, content, and community.

CODAIT: Center of Open-Source Data and AI Technologies

About two years ago, the Spark Technology Center (STC) launched to accelerate enterprise adoption of a critical technology in the distributed analytics space. Thanks to efforts by the vibrant and sizable Apache Spark community, Apache Spark has grown into a indispensable part of the enterprise analytics stack. The STC team made significant contributions into Apache Spark and the surrounding ecosystem (see STC team some statistics below).

Today, I am happy to announce Spark Technology Center’s expanded mission, which now encompasses the end-to-end Enterprise AI lifecycle. We will continue our mission on Apache Spark, and expand the team’s mission to include AI technologies with a near term focus on deep learning. The Center for Open-Source Data and AI Technologies, or CODAIT, will aim to make AI models dramatically easier to create, deploy, and manage in the enterprise.

As part of the launch we are introducing two significant projects which will help improve the adoption of Deep Learning in the Enterprise. Both these projects are open sourcing significant research IP.

Fabric for Deep Learning (FfDL): We are announcing Fabric for Deep Learning, or FfDL (pronounced “fiddle”), as an open source project. It embraces a wide array of popular open source frameworks like TensorFlow, Caffe, and PyTorch, and offers them truly as a cloud native service. Leveraging the power of Kubernetes, FfDL provides a scalable, resilient, and fault-tolerant deep-learning framework.

Model Asset eXchange (MAX): In addition, as one of the first initiatives in this space, an open source enterprise Model Asset eXchange (MAX) is being launched from the CODAIT team. MAX is a a one-stop exchange for data scientists and AI developers to consume models created using their favorite machine learning engines like TensorFlow, PyTorch, and Caffe2, and provides a standardized approach to classify, annotate, and deploy these models for prediction and inferencing.

CODAIT team statistics

1 comment on"Creating a center of gravity around open source data and AI technologies"

  1. […] Announcing Center for Open-Source Data & AI Technologies (CODAIT). This expands the mission of the Spark Technology Center to focus on making AI models easier to create, deploy, and manage. […]

Join The Discussion

Your email address will not be published. Required fields are marked *