We are in the midst of a sea change in the data and AI landscape these days. Throughout the entire stack, from hardware to distributed data processing, all the way up to advanced machine learning and deep learning, these changes will have a profound impact on our society. According to a study by McKinsey, in their survey of 3,000 AI-aware C-level executives, only 20% said they currently use any AI-related technology at scale or in a core part of their businesses. There is definitely a growing gap between early AI adopters and the rest of the community.
With the advent of open source deep learning engines like TensorFlow, PyTorch, Keras, etc., there’s a rapidly growing need for skills and technologies that provide a consistent and standardized way to interact with these different machine learning engines. We need to drive standardized approaches in the industry (for example, ONNX) to ensure we are all marching towards a common goal with a common set of standards and technologies. In addition, we want these technologies to be democratized so that they’re easily accessible to and consumable by developers, both in open source and enterprises.
At IBM, we’ve been a key community member and driver of this revolution. We believe these advances need to happen out in the open, driven by open standards, open code, open communities, and open governance. I’ve talked previously about the rich history of IBM and its contributions to open standards, as well as the work IBM is doing with IBM Code around democratizing these technologies using code, content, and community.
CODAIT: Center of Open-Source Data and AI Technologies
About two years ago, the Spark Technology Center (STC) launched to accelerate enterprise adoption of a critical technology in the distributed analytics space. Thanks to efforts by the vibrant and sizable Apache Spark community, Apache Spark has grown into a indispensable part of the enterprise analytics stack. The STC team made significant contributions into Apache Spark and the surrounding ecosystem (see STC team some statistics below).
Today, I am happy to announce Spark Technology Center’s expanded mission, which now encompasses the end-to-end Enterprise AI lifecycle. We will continue our mission on Apache Spark, and expand the team’s mission to include AI technologies with a near term focus on deep learning. The Center for Open-Source Data and AI Technologies, or CODAIT, will aim to make AI models dramatically easier to create, deploy, and manage in the enterprise.
As part of the launch we are introducing two significant projects which will help improve the adoption of Deep Learning in the Enterprise. Both these projects are open sourcing significant research IP.
Fabric for Deep Learning (FfDL): We are announcing Fabric for Deep Learning, or FfDL (pronounced “fiddle”), as an open source project. It embraces a wide array of popular open source frameworks like TensorFlow, Caffe, and PyTorch, and offers them truly as a cloud native service. Leveraging the power of Kubernetes, FfDL provides a scalable, resilient, and fault-tolerant deep-learning framework.
Model Asset eXchange (MAX): In addition, as one of the first initiatives in this space, an open source enterprise Model Asset eXchange (MAX) is being launched from the CODAIT team. MAX is a a one-stop exchange for data scientists and AI developers to consume models created using their favorite machine learning engines like TensorFlow, PyTorch, and Caffe2, and provides a standardized approach to classify, annotate, and deploy these models for prediction and inferencing.
CODAIT team statistics
- The CODAIT team contributes to over 10 open source projects. These projects include Spark, Tensorflow, Keras, SystemML, Arrow, Bahir, Toree, Livy, Zeppelin, R4ML, Stocator.
- The team has 17 committers and many contributors to Apache projects, including Apache Spark, Apache Arrow, Apache SystemML, Apache Bahir, Apache Toree, and Apache Livy.
- Over 900 JIRAs and 50,000 lines of code were committed to Apache Spark itself, and over 65,000 lines of code were committed into Apache SystemML.
- It is a leading contributor to Spark Machine Learning in major releases of Apache Spark.
- CODAIT engineers have interacted and worked closely with many of the more than 25 product lines within IBM leveraging Apache Spark.
- The team has presented at over 100 conferences, meetups, un-conferences, and other gatherings.
Help create a center of gravity for open source and enterprise AI developers!
We’re just getting started! We want to bring together open source and enterprise developers to create a center of gravity. Let’s all march towards a common goal of an AI renaissance with a standard set of tools and technologies!