TensorFlow Large Model Support (TFLMS) is a Python module that provides an approach to training large models and data that cannot normally be fit in to GPU memory. It takes a computational graph defined by users, and automatically adds swap-in and swap-out nodes for transferring tensors from GPUs to the host and vice versa. During training and inferencing this makes the graph execution operate like operating system memory paging. The system memory is effectively treated as a paging cache for the GPU memory and tensors are swapped back and forth between the GPU memory and CPU memory.

Here are links to blog posts, papers, and videos that describe TensorFlow Large Model support, use cases, and performance characteristics:

Join The Discussion

Your email address will not be published. Required fields are marked *