A little less than a year ago we released Watson Machine Learning Community Edition (WML CE) 1.6.0, which was a ground up reorganizing, refocusing and retooling of our “PowerAI” product. We started up the continuous delivery engine, wrapped it in a flexible Linux-distribution-independent conda-themed paint job and have been tirelessly working ever since on making our distribution of artificial intelligence and machine learning packages the most consumable, secure and enterprise-friendly set available anywhere.
So here we are a year later and we now have other IBM products such as IBM PowerAI Vision and IBM Watson Machine Learning Accelerator that use and rely on WML CE for supportable and deploy-able sources of AI technology. WML CE has even been used to discover previously unknown geoglyphs in Peru, one of which is now my personal mascot.
As proud as we are of how far we’ve come of the last year, there is no time to rest as the engine never stops. Our first release in 2020 is Watson Machine Learning Community Edition 1.7.0. With this version we see a major revision bump and that is paired with some major changes.
GPU Support update
Throughout last year, WML CE 1.6.x releases were paired with various versions of NVIDIA’s CUDA 10.1. One of the major updates in WML CE 1.7.0 is the inclusion of CUDA 10.2. This is the absolute latest release of CUDA and it brings performance updates, newer GPU compilers, tools, and updated versions of GPU accelerated libraries including cuBLAS, nvjpeg, and cusolver. WML CE ensures it works seamlessly for all of the included GPU-enabled packages. The latest CUDA release notes can always be found on NVIDIA.com. It’s important to note that with this new version of CUDA, an updated GPU driver is also required. CUDA 10.2 requires a driver in the 440.x series. So before pulling down the new packages in WML CE 1.7, take a visit over to NVIDIA’s Tesla Recommended Drivers (TRD) page to grab the latest recommended driver. WML CE will take it from there.
NVIDIA’s latest version of TensorRT (version 7.0) has also been included in WML CE 1.7.0. TensorFlow and PyTorch have both been enabled to work with it, as usual. TRT 7.0 brings enhancements along with a few changes to its API that break compatibility with some previous usage models so be sure the read the docs and adjust your code as necessary.
TensorFlow 2: Reporting for service!
The TensorFlow community tagged version 2.0 last fall. It was a major departure from the 1.x series. We opted to include the final 1.x release (1.15) in our final 2019 release of WML CE to allow time for everyone (users, the TensorFlow community, ourselves) to digest the changes, build up supporting examples and working models, and (honestly) get everything working as designed on the 2.x API set.
With the 2.1 release, many of the growing pains have been addressed, the APIs have stabilized, and bugs have been squashed, so we are ready to introduce the first fully support Tensorflow 2.x version to WML CE! Tensorflow 2.1 ships with eager mode enabled by default and tf.keras as the default high level API, so there is a good chance your existing TensorFlow codes will have to change in order to run in TF2.1. In addition, code will likely have to be rewritten to an extent to become TensorFlow2 “native”. It takes a bit of work, but getting to native TF2 code not only makes your code supportable and easier to debug, it will also be faster!
As an intermediary step towards TF2 nativity, leverage the migration guide to learn about the tensorflow.compat.v1 and tf.disable_v2_behavior() APIs as well as the TensorFlow2 upgrade script (tf_upgrade_v2), which is shipped and installed in every conda environment that you install WML CE TensorFlow into.
For truly native TF2 code, all TensorFlow 1.x sessions must be replaced by functions. Consider a complete rewrite of your models focusing on the easy-to-use Keras API. Speaking of Keras, WML CE 1.7.0 includes the multi-backend “Keras Team” version of Keras as a separate add-on package, but this version will not be updated in the future. Going forward, Keras will live on as tf.keras and be included in TensorFlow as the preferred high level API to use. Migrating all Keras code to use the appropriate tf.keras APIs is recommended and also pretty painless. TensorFlow 2.1 also includes a new Keras mixed precision API, which takes advantage of hardware support for lower precision types. This mode can be enabled with:
policy = mixed_precision.Policy('mixed_float16')
We’ve also completely rewritten our TensorFlow Large Model Support extension for TensorFlow 2. This new version is an extension to TensorFlow’s GPU memory allocator and management functions deep in the core of TensorFlow itself, so no extra package is needed. Instead, LMS is always present our TensorFlow build and enablement of LMS in TensorFlow 2.1 is a simple configuration setting:
An experimental GPU memory defragmentation feature is also included, but disabled by default for safety. To enabled GPU defrag, use:
In addition, this new LMS code has been given a friendly open source license and is available on github in a patch form, able to be applied to a TF2.1 tag checkout. There is more information on the new LMS for Tensorflow 2 in the README and in the IBM Knowledge Center.
TensorFlow Probability, Estimator, TensorBoard and TensorFlow Serving all have been updated in WML CE 1.7.0 to levels compatible with TensorFlow 2.1.
PyTorch catching up
There has been a lot said recently of PyTorch catching up with TensorFlow from a user base perspective. We know for certain it has caught up from a release cadence perspective with PyTorch 1.3, 1.3.1 and 1.4 all released since October 2019.
WML CE’s PyTorch 1.3.1 includes several interesting new features. The experimental “Named Tensor” support allows you to associate names with tensor dimensions. Named tensors improve code safety and will make code more self-documenting. Automatic type promotion is also new and allows safe and sane intermixing of tensor types without the need for explicit type conversions. PyTorch’s TensorBoard interface gets better and adds support for 3D meshes and hyperparameter logging. Support for model serialization is improved by many enhancements and fixes to TorchScript. Other PyTorch 1.3.1 bug fixes and features are described in the release notes for 1.3.0 and 1.3.1.
Note there are a few incompatibilities that could require minor changes to existing PyTorch scripts. Most but not all upstream features are included in WML CE. In particular we’re still investigating the experimental support for quantization, but it’s not yet included in WML CE.
PyTorch 1.3.1 retains the previously added Large Model Support (LMS) implementation. Details on PyTorch LMS are in the Knowledge Center.
Expanding support for SnapML
In previous WML CE releases, IBM’s own SnapML distributed accelerated machine learning framework had gained new algorithms including Decision Trees and Random Forest which work as drop-in replacements for calls to scikit-learn. In WML CE 1.7.0, the Decision Tree algorithm has gained both a multi-threaded CPU version as well as a GPU-accelerated version. The Random Forest algorithm gains full multi-GPU support. The included snapml-spark package turns the crank to gains Spark 2.4 support. The pia4sk package is loaded with new examples to showcase the speedup and accuracy gained with these optimised SnapML tree-based solvers. Refer the API documentation page for usage details.
In addition to the new GPU support, the SnapML packages (pai4sk and snapml-spark) also now have versions for the x86-64 platform as well so users can benefit from SnapML’s capabilities on all platforms they own. SnapML’s SnapBoost algorithm remains a Technology Preview.
We all dance together, Horovod!
By popular demand, Horovod 0.19 has been added to Watson Machine Learning Community Edition 1.7.0. Horovod is a popular distributed deep learning coordinator for TensorFlow, Keras, and PyTorch. In WML CE, Horovod uses NCCL with MPI to communicate among nodes. Horovod and IBM’s own DDL have evolved together, and there are many similarities including a horovodrun command that does similar things to ddlrun. Between Horovod, NCCL and DDL there are no shortages of excellent solutions for efficient multi node training.
RAPIDS.ai rapidly updating
This release of WML CE has cudf 0.11.0 and cuml 0.11.0. There is now also some SnapML integration support cuDF (DeviceNDAarray as input is supported) and cuML. Dask support for GPU-backed dataframe has come a long way in the RAPIDS community. CuPy, dask-cuda & dask-cudf are updated in WML CE 1.7.0, but they along with the multi-GPU machine learning algorithms remain in technology preview for this release.
Linear models in SnapML support DeviceNDArray as input for training since WML CE 1.6.2. There are lots of new examples here as well that showcase the speedup of SnapML working with CuPy, a yet another GPU array technology. The pai4sk package continues to have some support for certain cuML algorithms when cuDF is passed in as input. Please refer pai4sk documentation page for more details.
The engine churns onward…
We hope WML CE has helped with the speed and maintainability of your deployments. The engine doesn’t stop and these technologies are just getting started. As usual, let us know how we’re doing or what you would like to see in Watson Machine Learning Community Edition (I vote for fancy rims and some chrome accents).
We’d also love to know what you are using Watson Machine Learning Community Edition for! If you are using WML CE for a project, let us know and like the geoglyphs in this blog, I highlight your work in the next one!
|IBM Knowledge Center|