Build a movie recommendation system with and without GPU

Deep learning algorithms are data thirsty. They also need high computing capabilities to be able to process these large data sets. It is impossible to talk about the performance of a model without talking about the underlying hardware architecture that supports model training.

Even though deep learning algorithms were invented in the mid 60s, significant breakthroughs did not happen until 1999, the year in which GPUs were invented. The need for GPUs in building a deep learning model has been mentioned so often that these terminologies have become synonymous with each other.

In this article, learn about building a movie recommendation engine and performing model training in two ways, CPUs versus GPUs. You learn:

  1. The differences between CPUs and GPUs
  2. About Movie Company ABC
  3. How to train the movie recommendation engine on CPUs
  4. How to train the movie recommendation engine on GPUs with Watson Machine Learning Accelerator
  5. Compare performances on CPU versus GPU

CPU versus GPU

Central processing unit (CPU) is a general-purpose, integrated circuit. Think of it as the brain of a computer that is responsible for carrying out various calculations on data. The CPU primarily consists of three components: the arithmetic logic unit (ALU), the control unit, and registers.

Working in tandem, the CPU fetches instructions, decodes each instruction, and performs calculations on data. With clock speeds usually ranging between 2.0 to 4.0 GHz, the CPU is best suited for tasks that require low latency. They are also highly suited for sequential tasks.

Alternatively, a graphics processing unit (GPU) is a specific-purpose, integrated circuit. Embedded on graphic cards, the GPUs were initially used in the gaming industry to provide high-quality and seamless visual effects. Backed by thousands of cores, with each core’s speed ranging between 1.0 to 2.0 GHz, they are best for tasks that require high throughput.

Image sources: Lion, pack of wolves

Even though the lion is stronger than a wolf, a pack of wolves can defeat the lion. Similarly, while the individual cores within the GPUs are not as versatile as the ones within the CPUs in terms of the range of tasks that they can perform, they outperform CPUs with regard to parallel processing.

Because deep learning is classified as ‘embarrassingly parallel‘, these tasks can be parallelized by running them on GPUs. Also, with the advent of parallel computing platforms and APIs such as CUDA and OpenCL, data scientists can use the benefits of GPUs without having to work with low-level languages.

Movie Company ABC data science team

When talking about deep learning, recommendation systems are one of the most beneficial applications. Availability of abundant structured data around shopping patterns and high compute capabilities makes recommendation systems apt use cases for deep learning.

movie recommendation image

In this article, Sophie and Victoria from Movie Company ABC’s data science team both build a recommendation engine with a restricted Boltzmann machine using TensorFlow. However, to train this model, Sophie uses a CPU-based environment and Victoria uses a GPU-based environment.

The notebook with the TensorFlow code to build the movie recommendation engine and model training information can be found in the Train a movie recommendation engine with Watson Machine Learning Accelerator notebook. The details of the code to build the model have been abstracted in this article so that you can focus on the model training aspects. In the following sections, I highlight code snippets that are essential to follow.

Train the movie recommendation engine on CPUs

In the first method, Sophie from Movie Company ABC’s data science team trains the deep learning model on a system that uses only a CPU.

Meet Sophie

For the sake of this experiment, Sophie used IBM Cloud Pak for Data to run this training under a CPU-based environment. Under this cloud environment, one vCPU or virtual CPU was used.

As seen in the following snippet, Sophie is starting the model training on the cloud environment with 5 epochs.

CPU input

For a data set with 200,000 records, you see that it takes 273 seconds for just the model training and 292 seconds for the entire run to finish.

CPU output

Train the movie recommendation engine on GPUs

Alternatively, Victoria trains this same model on a system that has access to the GPU. Victoria uses the Watson Machine Learning Accelerator service under IBM Cloud Pak for Data. While this environment can be equipped with various GPUs, she uses one NVIDIA V100 Tensor Core.

Meet Victoria

Watson Machine Learning Accelerator

Available as part of the Watson Studio service within IBM Cloud Pak for Data (or installed as a stand-alone offering on-premises), Watson Machine Learning Accelerator is a GPU-supported platform that provides acceleration and support for several challenges that you typically face during the development of deep learning models.

Watson Machine Learning Accelerator helps data scientists within the same organization share resources through an elastic distributed training of workloads. It also supports automated hyperparameter optimization. To get a complete overview of Watson Machine Learning Accelerator, review the Get started with Watson Machine Learning Accelerator learning path.

Model training on Watson Machine Learning Accelerator can be submitted through three ways:

  1. Use the Watson Machine Learning Accelerator API through a Jupyter Notebook by using the REST API directly. This Jupyter Notebook can be on any platform, such as locally on your system or within Watson Studio in Cloud Pak for Data.
  2. Use the Watson Machine Learning API directly through a Jupyter Notebook.
  3. Use the Watson Machine Learning API indirectly through experiment builder in Watson Studio.

Each of these approaches is discussed in detail in the How to use Watson Machine Learning Accelerator article. This article follows the first method.

Victoria installs cuDF, a GPU DataFrame that is used for handling the data set, and TensorFlow 2.3 to train her model in Watson Machine Learning Accelerator. She creates a conda environment that contains all of the required packages to run the model training.

As seen in the following snippet, Victoria is starting the model training by submitting a REST request to the Watson Machine Learning Accelerator API.

GPU input

You see that for the same 200,000 records, the exclusive training time is 5 seconds, and the end-to-end process that includes scheduling a GPU took 61 seconds.

GPU output

Compare performances on CPU versus GPU

With the same movie lens data set that Sophie and Victoria used to develop the movie recommendation engine, you see significant differences in the various key metrics. The following table shows the experimental setup and outcomes from Sophie and Victoria based on their model development strategies.


From the table, you see how training on a GPU using Watson Machine Learning Accelerator to build a movie recommendation engine saved Victoria significant time and effort. Even though all parameters of model training were kept common between Sophie and Victoria, you see that Victoria’s model trained approximately 60 times faster with the use of just one GPU.


Because deep learning algorithms are data hungry and need high computing capabilities, this article explained how Watson Machine Learning Accelerator can help you accelerate your model training.