Begin your journey here
- Deep learning 101
- Watson Machine Learning 101
- Intro to the products
- Watson Machine Learning Community Edition
- Watson Machine Learning Acclerator
- PowerAI Vision
- Key technologies
- Large Model Support
- Distributed Deep Learning
- Snap ML
- Ideas to get you started
- Tips for adding deep learning and AI to your applications
- Blogs from the experts
What is Deep learning?
Deep Learning consists of algorithms that permit software to train itself by exposing multilayered neural networks to vast amounts of data. It is most frequently used to perform tasks like speech and image recognition.
The intelligence in the process sits within the deep learning software frameworks, which develop the neural model of understanding by building weights and connections between many, many data points â€” often millions in a training data set.
Deep learning is a platform that is capable of effectively learning how to learn and it is immensely powerful for helping you get the most out of your data.
However, whether you’re just starting your journey or you’re well on your way, you’ve probably pondered how to best address some common pain points related to deep learning and AI model development and deployment. Maybe you’ve asked:
- How do I scale my deep learning workloads?
- How can I shorten the time to train and deploy models?
- How do I address a lack of deep learning skills in my organization?
What is Watson Machine Learning?
With the Watson Enterprise AI family of products, IBM has built an end-to-end set of developer tools that can address your deep learning pain points and more. The software suite is based on open source software but enhances it for ease of use. Watson Machine Learning, which is what we’re focusing on here, is the execution control plane where you train your models, monitor model training, and also run your inference. It was designed for the rapidly growing and quickly evolving AI category of deep learning. For more information about Watson Studio and Watson OpenScale go here, https://www.ibm.com/cloud/watson-studio and here, https://www.ibm.com/cloud/watson-openscale/, respectively.
Accelerated deep learning with IBM Power Systems
Running the WML products on IBM Power Systems AC922, with it’s integrated NVLink 2.0, not only accelerates the communications between GPUs, it also accelerates the communications from GPU to CPU significantly. This means that data can more easily flow from system memory into GPUs and back again as needed. Learn more about the IBM Power Systems built for AI.
Watson Machine Learning Community Edition
Designed to get you set up and operating as quickly as possible, Watson Machine Learning Community Edition (WML CE) is delivered as a set of software packages that can deploy a functioning deep learning environment, potentially within hours, and usually in less than one hour with a few simple commands.
The software distributions are pre-compiled and include everything you need to build and manage a distributed environment, including the deep learning frameworks and any supporting software components that they require to run.
Watson Machine Learning Accelerator
For enterprises looking to rapidly scale their deep-learning applications, Watson Machine Learning Accelerator (WML Accelerator) includes the open frameworks, libraries, and tools built into WML CE plus additional components such as IBM Spectrum Conductorâ„˘ Deep Learning Impact and IBM Spectrum Conductor, providing functionality to optimize and speed up the completion of your training, testing, and validation. WML Accelerator truly shines when you are looking to expand into distributed deep learning with more than four nodes. WML Accelerator can scale from a single node to 100’s of nodes.
On top of the stack, is PowerAI Vision, which automates the machine learning and deep learning workflow. PowerAI Vision provides tools and interfaces for business analysts, subject matter experts, and developers without any skills in deep learning technologies to begin using deep learning. This enterprise-grade software provides a complete ecosystem to label raw data sets for training, creating, and deploying deep learning-based models. It can help train highly accurate models to classify images and detect objects in images and videos.
The tools assist users to focus on rapidly identifying datasets and labeling them. They can then train and validate a model in a GUI interface to build customized solutions for image classification and object detection.
Large Model Support
LMS addresses a fundamental limitation of deep learning: The size of memory available within GPUs. When training complex models or training with high definition images, the memory available on a GPU can be prohibitively restrictive. WML CE and WML Accelerator with LMS allows the GPU to access System (CPU) memory resulting in better models and better precision. Learn more:
Distributed Deep Learning
Distributed Deep Learning (DDL) distributes a single training job across a cluster of servers thus accelerating the time dedicated to training a model. WML CE and WML Accelerator with Distributed Deep Learning can scale jobs across large numbers of cluster resources with very little loss due to communications overhead. Learn more.
Snap ML is a library for training generalized linear models. It is being developed by IBM with the goal of removing training time as a bottleneck for machine learning applications. Snap ML supports a large number of classical machine learning models and scales gracefully to data sets with billions of examples and/or features. It offers distributed training, GPU acceleration and supports sparse data structures. Learn more.
Tips to get you started
Watson Machine Learning Community Edition (WML CE) helps you get started faster with your deep learning development. Here are some tips for using WML CE to add deep learning and AI to your application.
Follow these simple steps to get your application development started.
Deploy WML CE
WML CE deploys on a system far more rapidly than manual installation of frameworks. Start with:
- Red Hat Enterprise Linux (RHEL) 7.6 or Ubuntu 18.04
- NVIDIA CUDA SDK
- NVIDIA GPU Driver for Linux
The WML CE binaries are available as Conda packages and run on IBM Power Systems S822LC and AC922. See the WML CE release notes for more information.
Test your frameworks
Once youâ€™ve deployed WML CE, you can test each of the deep learning training frameworks. Each framework included in WML CE is unique and selecting a preferred framework for your application is important.
The integrated installer for WML CE means you have everything installed and performant, so you can rapidly try examples in each and select suited to your preferences.
Devise your approach and start training
Collecting great input data to train on is critical to your modelâ€™s success. Donâ€™t neglect existing data inside your organization or consider training on external datasets. Your data can be visual, audio, text, or beyond.
Packages like TensorFlow in WML CE incorporate tools to help make your training network design even easier.
Add deep learning to your applications
For the Enterprise:
- Layer Deep Learning atop your existing data-store: Tease out value from your existing data by applying deep learning as a technique for advanced analysis.
- Reshape or augment an existing business process: Augment human insight or manual labor with machine intelligence. Use deep learning to train a visual or audio recognition system that helps guide decisions.
For High Performance Computing (HPC)
- Apply deep learning before HPC simulation: Improve the quality of your HPC simulation runs by using deep learning to identify which kinds of simulations to run or run first. Then run those high-likelihood simulations with greater precision.
- Apply deep learning after HPC simulation: Drowning in data after your HPC simulations run? Sift through existing unstructured data or vast outputs of a simulation with Deep Learning and gain new insights rapidly.
Deep learning and AI on IBM Power Systems blogs
Read what the experts are saying about deep learning with Watson Machine Learning and IBM PowerAI Vision.
The IBM Power System AC922 can have many physical cores, and with the ability to specify a symmetric multithreading value of 4 (SMT4), this can lead to a very large number of logical processors. This allows a high amount of concurrent work across physical CPU cores. When using the AC922’s GPUs for TensorFlow jobs, the...
Previous blogs and videos have discussed tensor swapping with TensorFlow Large Model Support (TFLMS) while running on the IBM Power Systems AC922. Unlike other systems, IBM Power Systems connect their GPUs to their CPUs using high bandwidth NVLink connections. This has been shown to produce substantial speed improvements to model training while using TensorFlow Large...
Image data channel ordering is usually specified as â€śchannels firstâ€ť (NCHW) or â€śchannels lastâ€ť (NHWC). In many cases, operations on GPUs run faster with data in â€śchannels firstâ€ť format. TensorFlow contains a layout optimizer that will attempt to transpose the data for the fastest computation. The data transformations produce tensors which will consume GPU memory...
TensorFlow Large Model Support (TFLMS) is a Python module that provides an approach to training large models and data that cannot normally be fit in to GPU memory. It takes a computational graph defined by users, and automatically adds swap-in and swap-out nodes for transferring tensors from GPUs to the host and vice versa. During...
Introduction In PowerAI 1.6, the TensorFlow Large Model Support (TFLMS) module has a new implementation and has graduated from tech preview status. This new implementation can achieve much higher levels of swapping which in turn, can provide training and inferencing with higher resolution data, deeper models, and larger batch sizes. In this article, we investigated...
In PowerAI 1.6, the TensorFlow Large Model Support (TFLMS) module has a new implementation and has graduated from tech preview status. This new implementation can achieve much higher levels of swapping which in turn can provide training and inferencing with higher resolution data, deeper models, and larger batch sizes. For a review of TFLMS and...
Working with Snap ML in WML Accelerator 1.2.0 Spectrum Conductor in WML Accelerator 1.2.0 provides capability to setup Spark cluster automatically. To execute an application using snap-ml-spark APIs in Spectrum Conductor environment in IBM WML Accelerator, either Run snap-ml-spark application through spark-submit in IBM WML Accelerator OR Enable snap-ml-spark APIs inside Jupyter Notebooks in IBM...
The latest version of PowerAI is here with version 1.6.0! We’ve been hard at work transforming the packaging and delivery model, updating the versions of the included frameworks and packages and adding new features. Let’s walk through the major changes in 1.6.0. Conda Packaging and the PowerAI conda channel Previous releases of PowerAI introduced interoperability...
IBM PowerAI Enterprise is a powerful platform that enables data scientists with ready-to-use Deep Learning frameworks, hyper-parameter search and optimization for feature engineering, resource utilization optimizations for model training, and several new and compelling features to accelerate the performance of the training job. You can deploy IBM PowerAI Enterprise platform on your on-premise data center...
There are few steps that needs to be performed to use Snap ML in DSXL environment. These are outlined as follows: Go to the IBM DSXL web console. Login with your username and password. On the IBM DSXL homepage dashboard, click Add project. Once you click Add project, a new window named Create Project appears...