It took 4 days to run FFT (Fast Fourier transform) on a scanned grey scale image using 486 processor system way back in 1994, my engineering project , Digital Image Processing. Now, it takes few hours to train a deep learning model on a 32 core CPU system, say Music information retrieval (MIR), which has around 100k audio tracks of size 1000GB.
What is that we have achieved in the last 24 years ? able to do high computation (flops) in quick time and able to process huge amount of data. From one single image feature extraction for 4 days to 100k images of feature extraction in different scale in hours if not days.
But still we look for a quicker processing capability. The evolution in the neural network has brought in the capability of rich feature extraction from thousands of images to identify a pattern in the data for further classification. This needs a high compute system. An image of say , 320 x 280 resolution needs 268,800 flops of computation (320x280x3 (RGB) )
We have the GPU enabled system now which is capable of supporting parallel processing with its huge numbers of processing threads.So why to wait for days/hours if it can be done in hours or minutes . CUDA (Computer Unified Device Architecture) is the framework which supports the device level data movement and memory management.
As a Deep Learning developer how we can leverage the GPU processor to accelerate the DL training process ? Read about my experience here, on training the DL models using the IBM Power9 processor based environment and with Keras & TF framework.