IBM Developer Blog

Follow the latest happenings with IBM Developer and stay in the know.

Learn how one team was able to use AutoAI to fully replace a missing data input with a prediction based on historical data.

With contributions by Aaron Baughman, Eythan Holladay, John Kent, Tyler Sidell, Jenna Miller, and Jeff Gottwald.

In April 2019, IBM® introduced a system to automatically produce the “Round in Under 3 Minutes” video highlight package for every player in every round of the Masters Golf Tournament. This built off of the award winning work from the previous year’s tournament as well as work from several other sports.

Image of Kevin Kisner

Automating the creation of these videos dramatically reduced the time and effort required to produce this content. Quickly producing this content gave the Masters editorial team first mover advantage, coverage breadth across the entire field, and freed up critical video editing cycles to be used elsewhere.

This automated system ingested video from every shot on every hole. The workflow that created the input content later went on to win the George Wensel Technical Achievement Emmy Award as well as many other accolades. As this content was ingested, it was evaluated for “highlight-worthiness” using a metric referred to internally by IBM as “excitement.” The excitement metrics were derived using artificial intelligence (AI) analysis of the content. These metrics were further enhanced by using IBM Watson® OpenScale to remove bias from the ranking. These excitement metrics, coupled with storytelling business rules, were used to select which golf scenes to include. These selections were then used to create a fully produced highlight video with broadcast interstitials and automated TV graphics minutes after the player completed their round. The work for the 2019 Masters was described in this IBM Developer blog.

Fast forward to 2020. The Masters was postponed until November, and the decision was made that patrons would not be present. This presented a new challenge because one of the inputs for the IBM excitement rankings is evaluation of the sounds from the course, specifically, the unique properties and durations of the “Masters Roar” from the crowd. This gap would create a significant challenge in the scene selection process using the excitement metrics.

Brainstorming on this challenge, the IBM iX team supporting the Masters project came up with a solution to predict the missing crowd noise metrics. After all, they had 20,000 video clips and debiased rankings from the 2019 tournament. A model could be built to predict what the crowd noise excitement levels should be. These predicted crowd noise metrics could then be used to replace the missing crowd noise, with few changes to the rest of the system. The team decided to use the IBM AutoAI solution on IBM Cloud to create the crowd noise prediction model.

Environment setup

The first steps are to add the AutoAI service to your IBM Watson Studio space on IBM Cloud, and to create an AutoAI experiment.

Add Watson AutoAI

New AutoAI experiment

Data prep

Next, a data set is needed. In this case, this was historical data from the 2019 Masters Round in Under 3 Minutes workflow. This rich data set included traditional golf statistics, ball tracking data, crowd noise, and excitement rankings that had been debiased using Watson OpenScale.

The AutoAI system uses CSV files as the data set. This is a low-complexity format that can be put together with little effort. The 2019 data was stored in an IBM Cloudant database on IBM Cloud. To get the correct data, the team built a simple Cloudant view that would return a long JSON array containing all records.

function (doc) {
  if ((doc.year == 2020) && (doc.type=='aiclip') && (doc.workflow.emotion_ranking_completed==true)) {
    emit(doc._id,{"metadata": doc.metadata,"emotion_scores": doc.emotion_scores, "clip_duration": doc.clip_duration, "url": doc.url});

Using the JSON result from Cloudant, the IBM iX team built a script to flip this data into a 20,000-line CSV file that contained 32 columns of historical data points from 2019, including basic data points like player ID, round, hole, and score as well as tracking stats such as the zone the player shot from, the zone where the ball came to rest, distance to pin, and shot length. The last data point was the debiased excitement score, which is what the system would learn to predict based on the other 32 data points. The following image shows a subset of the CSV data used for training.

CSV data for training

Create model

The prepared data set was then uploaded to a project in the IBM Cloud Pak® for Data platform on IBM Cloud so that it could be leveraged to create a crowd noise prediction web service.

Create score data assets

Using AutoAI within the IBM Cloud Pak for Data project, we assigned our target variable (2019 crowd scores) and other parameters such as the train-test split (90/10) and the evaluation metric (Root Mean Square Error).

Crowd score estimation

AutoAI then preprocessed the features so that models could be trained using various machine learning algorithms. In total, 16 distinct models were trained with various combinations of algorithms, derived features, and hyperparameter optimizations.

Experiment sumamary

Following the training, each model was evaluated by cross validation and a holdout sample so that we could identify the top performers.

Top performers


Select asset type

From the 16 AutoAI models, the project team selected a model trained with the XGBoost algorithm for our predictions. Through the Cloud Pak for Data user interface, we promoted the selected model into a Watson Machine Learning deployment space and were issued a scoring endpoint.

Debiased score

Create a deployment

Our service returns the model’s predictions in response to POST requests containing one or more feature vectors. Watson Machine Learning secures this endpoint with Identity Access Management and allows us to control scaling from within the UI.

Workflow integration

After the API endpoint was created using Watson Machine Learning, the IBM iX team replaced the API call to crowd noise analysis in the workflow with a call to the Watson Machine Learning endpoint. The resulting payload from Watson Machine Learning looked like the following code:

  "predictions": [{
    "fields": ["prediction"],
    "values": [[13.29423713684082]]

Crowd noise prediction results were then written to the 2020 Cloudant database for the project. The Round in Under 3 Minutes workflow could then be used in the same way it was used in 2019. The new My Group feature also uses the excitement rankings from this workflow.



AutoAI is an easy-to-use and flexible platform to use to make predictions with highly structured data. In this case, the IBM iX team was able to use AutoAI to fully replace a missing data input with a prediction based on historical data. This ensured asset reuse and associated benefits from previous projects. The only coding required was data formatting.