Begin your journey here
- Deep learning 101
- Watson Machine Learning 101
- Intro to the products
- Watson Machine Learning Community Edition
- Watson Machine Learning Accelerator
- PowerAI Vision
- Key technologies
- Large Model Support
- Distributed Deep Learning
- Snap ML
- Ideas to get you started
- Tips for adding deep learning and AI to your applications
- Blogs from the experts
What is Deep learning?
Deep Learning consists of algorithms that permit software to train itself by exposing multilayered neural networks to vast amounts of data. It is most frequently used to perform tasks like speech and image recognition.
The intelligence in the process sits within the deep learning software frameworks, which develop the neural model of understanding by building weights and connections between many, many data points â€” often millions in a training data set.
Deep learning is a platform that is capable of effectively learning how to learn and it is immensely powerful for helping you get the most out of your data.
However, whether you’re just starting your journey or you’re well on your way, you’ve probably pondered how to best address some common pain points related to deep learning and AI model development and deployment. Maybe you’ve asked:
- How do I scale my deep learning workloads?
- How can I shorten the time to train and deploy models?
- How do I address a lack of deep learning skills in my organization?
What is Watson Machine Learning?
With the Watson Enterprise AI family of products, IBM has built an end-to-end set of developer tools that can address your deep learning pain points and more. The software suite is based on open source software but enhances it for ease of use. Watson Machine Learning, which is what we’re focusing on here, is the execution control plane where you train your models, monitor model training, and also run your inference. It was designed for the rapidly growing and quickly evolving AI category of deep learning. For more information about Watson Studio and Watson OpenScale go here, https://www.ibm.com/cloud/watson-studio and here, https://www.ibm.com/cloud/watson-openscale/, respectively.
Accelerated deep learning with IBM Power Systems
Running the WML products on IBM Power Systems AC922, with it’s integrated NVLink 2.0, not only accelerates the communications between GPUs, it also accelerates the communications from GPU to CPU significantly. This means that data can more easily flow from system memory into GPUs and back again as needed. Learn more about the IBM Power Systems built for AI.
Watson Machine Learning Community Edition
Designed to get you set up and operating as quickly as possible, Watson Machine Learning Community Edition (WML CE) is delivered as a set of software packages that can deploy a functioning deep learning environment, potentially within hours, and usually in less than one hour with a few simple commands.
The software distributions are pre-compiled and include everything you need to build and manage a distributed environment, including the deep learning frameworks and any supporting software components that they require to run.
Watson Machine Learning Accelerator
For enterprises looking to rapidly scale their deep-learning applications, Watson Machine Learning Accelerator (WML Accelerator) includes the open frameworks, libraries, and tools built into WML CE plus additional components such as IBM Spectrum Conductorâ„˘ Deep Learning Impact and IBM Spectrum Conductor, providing functionality to optimize and speed up the completion of your training, testing, and validation. WML Accelerator truly shines when you are looking to expand into distributed deep learning with more than four nodes. WML Accelerator can scale from a single node to 100’s of nodes.
On top of the stack, is PowerAI Vision, which automates the machine learning and deep learning workflow. PowerAI Vision provides tools and interfaces for business analysts, subject matter experts, and developers without any skills in deep learning technologies to begin using deep learning. This enterprise-grade software provides a complete ecosystem to label raw data sets for training, creating, and deploying deep learning-based models. It can help train highly accurate models to classify images and detect objects in images and videos.
The tools assist users to focus on rapidly identifying datasets and labeling them. They can then train and validate a model in a GUI interface to build customized solutions for image classification and object detection.
Large Model Support
LMS addresses a fundamental limitation of deep learning: The size of memory available within GPUs. When training complex models or training with high definition images, the memory available on a GPU can be prohibitively restrictive. WML CE and WML Accelerator with LMS allows the GPU to access System (CPU) memory resulting in better models and better precision. Learn more:
Distributed Deep Learning
Distributed Deep Learning (DDL) distributes a single training job across a cluster of servers thus accelerating the time dedicated to training a model. WML CE and WML Accelerator with Distributed Deep Learning can scale jobs across large numbers of cluster resources with very little loss due to communications overhead. Learn more.
Snap ML is a library for training generalized linear models. It is being developed by IBM with the goal of removing training time as a bottleneck for machine learning applications. Snap ML supports a large number of classical machine learning models and scales gracefully to data sets with billions of examples and/or features. It offers distributed training, GPU acceleration and supports sparse data structures. Learn more.
Tips to get you started
Watson Machine Learning Community Edition (WML CE) helps you get started faster with your deep learning development. Here are some tips for using WML CE to add deep learning and AI to your application.
Follow these simple steps to get your application development started.
Deploy WML CE
WML CE deploys on a system far more rapidly than manual installation of frameworks. Start with:
- Red Hat Enterprise Linux (RHEL) 7.6 or Ubuntu 18.04
- NVIDIA CUDA SDK
- NVIDIA GPU Driver for Linux
The WML CE binaries are available as Conda packages and run on IBM Power Systems S822LC and AC922. See the WML CE release notes for more information.
Test your frameworks
Once youâ€™ve deployed WML CE, you can test each of the deep learning training frameworks. Each framework included in WML CE is unique and selecting a preferred framework for your application is important.
The integrated installer for WML CE means you have everything installed and performant, so you can rapidly try examples in each and select suited to your preferences.
Devise your approach and start training
Collecting great input data to train on is critical to your modelâ€™s success. Donâ€™t neglect existing data inside your organization or consider training on external datasets. Your data can be visual, audio, text, or beyond.
Packages like TensorFlow in WML CE incorporate tools to help make your training network design even easier.
Add deep learning to your applications
For the Enterprise:
- Layer Deep Learning atop your existing data-store: Tease out value from your existing data by applying deep learning as a technique for advanced analysis.
- Reshape or augment an existing business process: Augment human insight or manual labor with machine intelligence. Use deep learning to train a visual or audio recognition system that helps guide decisions.
For High Performance Computing (HPC)
- Apply deep learning before HPC simulation: Improve the quality of your HPC simulation runs by using deep learning to identify which kinds of simulations to run or run first. Then run those high-likelihood simulations with greater precision.
- Apply deep learning after HPC simulation: Drowning in data after your HPC simulations run? Sift through existing unstructured data or vast outputs of a simulation with Deep Learning and gain new insights rapidly.
Deep learning and AI on IBM Power Systems blogs
Read what the experts are saying about deep learning with Watson Machine Learning and IBM PowerAI Vision.
As part of PowerAI Visionâ€™s labeling, training, and inference workflow, you can export models that can be deployed on edge devices (such as FRCNN and SSD object detection models that support TensorRT conversions). To enable you to start performing inferencing on edge devices as quickly as possible, we created a repository of samples that illustrate...
Large model support (LMS) technology enables training of large deep neural networks that do not fit into GPU memory. In this blog, we showcase the advantages of using IBMâ€™s WMLCE 1.6.1 TensorFlow Large Model Support (TFLMS) on DeepLabv3+ model and perform a competitive comparison to highlight IBMÂ® POWER9â„˘ processorâ€™s NVLink 2.0 advantages while training such...
Large Model Support (LMS) technology enables training of large deep neural networks that would exhaust GPU memory while training. PyTorch is a relatively new and popular Python-based open source deep learning framework built by Facebook for faster prototyping and production deployment. With its more pythonic nature, and less steeper learning curve compared to other frameworks,...
Co-authors: Christine Ouyang and Igor Khapov While artificial intelligence (AI)-assisted quality inspection significantly improves inspection cycle time and inspection accuracy, management and support for hundreds of thousands of cameras, robotic arms, and robots can be a challenge. A discussion on the implementation and deployment of AI-assisted quality inspection in an actual manufacturing production environment would...
The Center for Genome Research and Biocomputing (CGRB) at Oregon State University works closely with hardware vendors to test different configurations. Many of these configurations push the limits of processing hardware because they are used for cutting-edge research across a gamut of disciplines. Through the process of working with NVIDIA general-purpose computing on graphics processing...
In previous blog posts, we’ve discussed how to enable GPUs with Docker alone. In this post, we’ll walk you through enabling GPUs in Red Hat OpenShift. The notable difference is that OpenShift is Kubernetes-based and it includes additional features that ease GPU integration. One of these features is the device plugin, which can be used...
Watson Machine Learning Community Edition 1.6.2 has been released! The conda packages in the main channel have been updated and the container images on dockerhub are new. If you happen by our main channel with a browser, you will also notice WML CE now has a brand new dashboard web front end. The dashboard will...
cudaSuccess (3 vs. 0) initialization error tldr; If youâ€™re on an AC922 Server and are experiencing CUDA related initialization or memory errors when running in a containerized platform (such as Docker, Kubernetes, or OpenShift), you may have a mismatch in your platformâ€™s cpuset slice due to a race condition onlining GPU memory. Run https://github.com/IBM/powerai/blob/master/support/cpuset_fix/cpuset_check.sh on...
If you are trying to develop machine or deep learning models, chances are that you have used many open source Python libraries. Python packages are commonly found popular open source package repositories such as PyPI and Anaconda Distribution. Some Python packages e.g. Tensorflow, include native hardware and operating system specific libraries. Read on to see...
While the IBM Power platform has proven a valuable asset in tackling machine and deep learning challenges (MLDL), installing software necessary for such tasks has been quite challenging on any architecture even with many groups providing support. Popular tools such as Tensorflow and PyTorch depend on many python libraries and the use of finicky installation...