What is Multi-GPU?
Multi-GPU is a common way to train deep learning models with multiple GPUs. Because deep learning calculations can easily be parallelized, this significantly reduces training time. In fact, without multiple GPUs, many deep learning algorithms would be infeasible to train in a reasonable amount of time.
without multiple GPUs, many deep learning algorithms would be infeasible to train in a reasonable amount of time.
When running deep learning experiments on a cluster of GPUs, you can leverage the concept of GPU parallelism. This means that GPUs are combined into one pool of resources, which you can use to run a large experiment much faster than you would on a single GPU.
Not all deep learning frameworks support parallelism, and even if you don’t use it, you can still leverage multiple GPUs to run multiple algorithms or experiments in parallel (each one running on a separate GPU).
How to Choose the Best GPU for Deep Learning
Intensive deep learning tasks, such as identifying and classifying many objects in an image, or training with large amounts of data, can be very demanding, even for GPUs. How can you choose the GPU that is the best fit for your computational requirements?
- The most important feature of a GPU is memory bandwidth. Choose the GPU with the largest available bandwidth within your budget.
- The number of cores determines how fast the GPU can process data. Keep this in mind especially when working with large datasets.
- Video RAM (VRAM) measures the amount of data that a GPU can process at once. Check how much VRAM is required by the algorithms you plan to run in your experiments.
- Overall processing capacity of the GPU is obtained by multiplying clock frequency by the number of cores.
Here are three scenarios that describe the requirements for different types of deep learning projects:
- Simple deep learning model – use a cheap consumer-grade GPU, such as NVIDIA GTX 1030.
- Training large neural networks – use an advanced graphics processors such as NVIDIA RTX2080TI or the more powerful Titan series. You can also use cloud services, such as Google Cloud TPUs and Amazon EC2 P/G series of GPU instances.
- Multiple simultaneous experiments – if you need to run multiple versions of an experiment, for example to test different hyperparameters, you’ll need GPU parallelism. Look for a system with multiple GPUs – there are low-cost options available from several vendors, or enterprise-grade GPU servers from NVIDIA, known as DGX, which cost up to hundreds of thousands of dollars.
Top Metrics for Evaluating Your Deep Learning GPU Performance
Once you have acquired a suitable GPU system for your deep learning needs, you should evaluate how the GPU performs and adjust your configuration if necessary. Below are the key metrics you should evaluate when running experiments on GPUs.
A key measure to look for in a deep learning program is how intensively they use GPUs. This can be easily accessed via GPU monitoring interfaces like NVIDIA-smi. GPU usage is defined as the percentage of time at least one GPU kernel is running, which means it is used by the deep learning framework.
Monitoring GPU usage for deep learning is one of the best indicators for verifying that a GPU is actually being used (by default, most deep learning frameworks will only run on the CPU). You can also monitor real-time usage trends to identify bottlenecks in your code that may be slowing down the training process.
GPU Memory Access and Utilization
GPU memory status is also a good indicator of the number of GPUs used in an intensive deep learning process. NVIDIA-smi provides numerous memory metrics you can optimize to accelerate training.
Memory metrics show, for example, the percentage of time the GPU memory controller has spent reading to / writing from memory in the last few seconds. Other values, such as available memory and used memory, are also important because they show how efficiently your deep learning experiment is using the GPU’s memory. An important use of memory metrics is to adjust the batch size of training samples to fit the available memory.
Power Usage and Temperature
The power consumed by a specific GPUs represents how intensively it is used by the deep learning program, and also how power hungry your experiment is. This is significant when running deep learning on mobile or internet of things (IoT) devices, where power is limited and power consumption is of the deep learning model is critical.
Power consumption is closely related to the temperature at which the GPU is running. As temperature increases, electronic resistance increases, fans run faster and power consumption increases. GPUs are designed to throttle their performance under extreme thermal conditioning, which can slow down execution of deep learning processes.
Time to Solution
Perhaps the most important metric to watch for in a GPU is the time it will take you to reach a working deep learning solution. How many times will you need to run your model until you reach a required level of accuracy? For large-scale experiments, test your model on a GPU to see how long training sessions take, and optimize GPU parameters like mixed-precision and tuning batch and epoch sizes. This will help you understand if, given a certain GPU, it will take you days, weeks or months to reach a working solution.
GPUs are a foundation of deep learning projects, and can dramatically accelerate training processes. In this article we discussed basic GPU concepts, and covered the main parameters for selecting the right GPU for your project:
- Memory bandwidth
- Number of cores
- Video RAM
- Processing capacity (GPU frequency X number of cores)
In addition, we covered essential metrics you should watch when already using a GPU in your deep learning project – utilization, memory access, power usage, and time to solution. We hope this guide will take you one step further towards successful use of GPUs in your deep learning experiments.