Begin your journey here
- Deep learning 101
- Watson Machine Learning 101
- Intro to the products
- Watson Machine Learning Community Edition
- Watson Machine Learning Acclerator
- PowerAI Vision
- Key technologies
- Large Model Support
- Distributed Deep Learning
- Snap ML
- Ideas to get you started
- Tips for adding deep learning and AI to your applications
- Blogs from the experts
What is Deep learning?
Deep Learning consists of algorithms that permit software to train itself by exposing multilayered neural networks to vast amounts of data. It is most frequently used to perform tasks like speech and image recognition.
The intelligence in the process sits within the deep learning software frameworks, which develop the neural model of understanding by building weights and connections between many, many data points â€” often millions in a training data set.
Deep learning is a platform that is capable of effectively learning how to learn and it is immensely powerful for helping you get the most out of your data.
However, whether you’re just starting your journey or you’re well on your way, you’ve probably pondered how to best address some common pain points related to deep learning and AI model development and deployment. Maybe you’ve asked:
- How do I scale my deep learning workloads?
- How can I shorten the time to train and deploy models?
- How do I address a lack of deep learning skills in my organization?
What is Watson Machine Learning?
With the Watson Enterprise AI family of products, IBM has built an end-to-end set of developer tools that can address your deep learning pain points and more. The software suite is based on open source software but enhances it for ease of use. Watson Machine Learning, which is what we’re focusing on here, is the execution control plane where you train your models, monitor model training, and also run your inference. It was designed for the rapidly growing and quickly evolving AI category of deep learning. For more information about Watson Studio and Watson OpenScale go here, https://www.ibm.com/cloud/watson-studio and here, https://www.ibm.com/cloud/watson-openscale/, respectively.
Accelerated deep learning with IBM Power Systems
Running the WML products on IBM Power Systems AC922, with it’s integrated NVLink 2.0, not only accelerates the communications between GPUs, it also accelerates the communications from GPU to CPU significantly. This means that data can more easily flow from system memory into GPUs and back again as needed. Learn more about the IBM Power Systems built for AI.
Watson Machine Learning Community Edition
Designed to get you set up and operating as quickly as possible, Watson Machine Learning Community Edition (WML CE) is delivered as a set of software packages that can deploy a functioning deep learning environment, potentially within hours, and usually in less than one hour with a few simple commands.
The software distributions are pre-compiled and include everything you need to build and manage a distributed environment, including the deep learning frameworks and any supporting software components that they require to run.
Watson Machine Learning Accelerator
For enterprises looking to rapidly scale their deep-learning applications, Watson Machine Learning Accelerator (WML Accelerator) includes the open frameworks, libraries, and tools built into WML CE plus additional components such as IBM Spectrum Conductorâ„˘ Deep Learning Impact and IBM Spectrum Conductor, providing functionality to optimize and speed up the completion of your training, testing, and validation. WML Accelerator truly shines when you are looking to expand into distributed deep learning with more than four nodes. WML Accelerator can scale from a single node to 100’s of nodes.
On top of the stack, is PowerAI Vision, which automates the machine learning and deep learning workflow. PowerAI Vision provides tools and interfaces for business analysts, subject matter experts, and developers without any skills in deep learning technologies to begin using deep learning. This enterprise-grade software provides a complete ecosystem to label raw data sets for training, creating, and deploying deep learning-based models. It can help train highly accurate models to classify images and detect objects in images and videos.
The tools assist users to focus on rapidly identifying datasets and labeling them. They can then train and validate a model in a GUI interface to build customized solutions for image classification and object detection.
Large Model Support
LMS addresses a fundamental limitation of deep learning: The size of memory available within GPUs. When training complex models or training with high definition images, the memory available on a GPU can be prohibitively restrictive. WML CE and WML Accelerator with LMS allows the GPU to access System (CPU) memory resulting in better models and better precision. Learn more:
Distributed Deep Learning
Distributed Deep Learning (DDL) distributes a single training job across a cluster of servers thus accelerating the time dedicated to training a model. WML CE and WML Accelerator with Distributed Deep Learning can scale jobs across large numbers of cluster resources with very little loss due to communications overhead. Learn more.
Snap ML is a library for training generalized linear models. It is being developed by IBM with the goal of removing training time as a bottleneck for machine learning applications. Snap ML supports a large number of classical machine learning models and scales gracefully to data sets with billions of examples and/or features. It offers distributed training, GPU acceleration and supports sparse data structures. Learn more.
Tips to get you started
Watson Machine Learning Community Edition (WML CE) helps you get started faster with your deep learning development. Here are some tips for using WML CE to add deep learning and AI to your application.
Follow these simple steps to get your application development started.
Deploy WML CE
WML CE deploys on a system far more rapidly than manual installation of frameworks. Start with:
- Red Hat Enterprise Linux (RHEL) 7.6 or Ubuntu 18.04
- NVIDIA CUDA SDK
- NVIDIA GPU Driver for Linux
The WML CE binaries are available as Conda packages and run on IBM Power Systems S822LC and AC922. See the WML CE release notes for more information.
Test your frameworks
Once youâ€™ve deployed WML CE, you can test each of the deep learning training frameworks. Each framework included in WML CE is unique and selecting a preferred framework for your application is important.
The integrated installer for WML CE means you have everything installed and performant, so you can rapidly try examples in each and select suited to your preferences.
Devise your approach and start training
Collecting great input data to train on is critical to your modelâ€™s success. Donâ€™t neglect existing data inside your organization or consider training on external datasets. Your data can be visual, audio, text, or beyond.
Packages like TensorFlow in WML CE incorporate tools to help make your training network design even easier.
Add deep learning to your applications
For the Enterprise:
- Layer Deep Learning atop your existing data-store: Tease out value from your existing data by applying deep learning as a technique for advanced analysis.
- Reshape or augment an existing business process: Augment human insight or manual labor with machine intelligence. Use deep learning to train a visual or audio recognition system that helps guide decisions.
For High Performance Computing (HPC)
- Apply deep learning before HPC simulation: Improve the quality of your HPC simulation runs by using deep learning to identify which kinds of simulations to run or run first. Then run those high-likelihood simulations with greater precision.
- Apply deep learning after HPC simulation: Drowning in data after your HPC simulations run? Sift through existing unstructured data or vast outputs of a simulation with Deep Learning and gain new insights rapidly.
Deep learning and AI on IBM Power Systems blogs
Read what the experts are saying about deep learning with Watson Machine Learning and IBM PowerAI Vision.
If you are trying to develop machine or deep learning models, chances are that you have used many open source Python libraries. Python packages are commonly found popular open source package repositories such as PyPI and Anaconda Distribution. Some Python packages e.g. Tensorflow, include native hardware and operating system specific libraries. Read on to see...
While the IBM Power platform has proven a valuable asset in tackling machine and deep learning challenges (MLDL), installing software necessary for such tasks has been quite challenging on any architecture even with many groups providing support. Popular tools such as Tensorflow and PyTorch depend on many python libraries and the use of finicky installation...
Snap ML is available as part of IBM Watson Machine Learning Community Edition (WML CE) 1.6.1, a component in WML Accelerator 1.2.1. By setting up a WML Accelerator environment that that can execute snap-ml-spark APIs, you can complete the following Snap ML operations in WML Accelerator: Running snap-ml-spark applications through spark-submit Enabling snap-ml-spark APIs inside...
The 1.6.1 release of Watson Machine Learning Community Edition (WML-CE) added packages for both TensorRT and TensorFlow Serving. These two packages provide functions that can be used for inference work. This article describes the steps that a user should perform to use TensorRT-optimized models and to deploy them with TensorFlow Serving. Introduction to using TensorRT...
Data preprocessing is an integral part of any neural network and is often complex and expensive. Traditionally, these operations are carried out on the CPU, which creates a bottleneck in systems with higher GPU to CPU ratios. This limits the performance of training and inference due to the compute-intensive nature of traditional preprocessing operations. Additionally,...
In WML CE 1.6.1, TensorRT was added as a technology preview. TensorRT is a platform for high-performance deep learning inference that can be used to optimize trained models. This is done by replacing TensorRT-compatible subgraphs with a single TRTEngineOp that is used to build a TensorRT engine. These engines are a network of layers and...
The IBM Power System AC922 can have many physical cores, and with the ability to specify a symmetric multithreading value of 4 (SMT4), this can lead to a very large number of logical processors. This allows a high amount of concurrent work across physical CPU cores. When using the AC922’s GPUs for TensorFlow jobs, the...
Previous blogs and videos have discussed tensor swapping with TensorFlow Large Model Support (TFLMS) while running on the IBM Power Systems AC922. Unlike other systems, IBM Power Systems connect their GPUs to their CPUs using high bandwidth NVLink connections. This has been shown to produce substantial speed improvements to model training while using TensorFlow Large...
Image data channel ordering is usually specified as â€śchannels firstâ€ť (NCHW) or â€śchannels lastâ€ť (NHWC). In many cases, operations on GPUs run faster with data in â€śchannels firstâ€ť format. TensorFlow contains a layout optimizer that will attempt to transpose the data for the fastest computation. The data transformations produce tensors which will consume GPU memory...
TensorFlow Large Model Support (TFLMS) is a Python module that provides an approach to training large models and data that cannot normally be fit in to GPU memory. It takes a computational graph defined by users, and automatically adds swap-in and swap-out nodes for transferring tensors from GPUs to the host and vice versa. During...