TensorFlow Large Model Support (TFLMS) is a Python module that provides an approach to training large models and data that cannot normally be fit in to GPU memory. It takes a computational graph defined by users, and automatically adds swap-in and swap-out nodes for transferring tensors from GPUs to the host and vice versa. During training and inferencing this makes the graph execution operate like operating system memory paging. The system memory is effectively treated as a paging cache for the GPU memory and tensors are swapped back and forth between the GPU memory and CPU memory.
Here are links to blog posts, papers, and videos that describe TensorFlow Large Model support, use cases, and performance characteristics:
- 4 minute introduction to TensorFlow Large Model support – This video is a good quick introduction to TensorFlow Large Model Support. Note that the performance numbers at the end of this video are now out dated. See the performance links below for updated performance numbers.
- NVIDIA GPU Technology Conference 2019 presentation – A 40 minute presentation that discusses the use of TFLMS to overcome GPU memory limits and performance characteristics of TFLMS.
- Whatâ€™s new in PowerAI 1.6 TensorFlow Large Model Support – A blog that describes what’s new in TFLMS in PowerAI 1.6.
- Performance results with TensorFlow Large Model Support v2 – A blog that describes the performance characteristics of TFLMS.
- Tensor swapping with Recurrent Neural Networks in TensorFlow – A look at TensorFlow’s tensor swapping in RNNs and TFLMS with RNNs.
- Automatic GPU memory management for large neural models in TensorFlow – A paper which describes the graph theory and algorithms behind the latest TFLMS.
- TFLMS: Large Model Support in TensorFlow by Graph Rewriting – The original TFLMS paper that describes the graph theory and algorithms of TFLMS.
- A case study using TensorFlow Large Model Support with 3D U-Net for 3D image segmentation
- Fast and Accurate 3D Medical Image Segmentation with Data-swapping Method – This paper contains a comparison of using TFLMS versus patching method for large images. It also contains a comparison of TFLMS vs gradient checkpointing.
- Data-parallel distributed training of very large models beyond GPU capacity – This paper contains a real world use case of using TFLMS with IBM Distributed Deep Learning.
- Performance of 3DUnet Multi GPU Model for Medical Image Segmentation using TensorFlow Large Model Support – This blog post contains performance comparisons of whole system training using TFLMS with IBM Distributed Deep Learning and Horovod on x86 and IBM AC922 servers.
- Image data channel effects on memory usage and performance with TensorFlow Large Model Support