Win $20,000. Help build the future of education. Answer the call. Learn more

IBM Developer Blog

Follow the latest happenings with IBM Developer and stay in the know.

Follow your favorite players without missing any of the best moments

With almost 5,000 holes played by 80 – 90 players at the Masters, around 20,000 shots are scored and captured. The majority of the golf shots are acquired on video for distribution around the world. In most cases, the patron is only able to view those shots presented by the broadcaster on produced video channels. When this happens, a patron’s favorite player might be overlooked and many of the other equally exciting moments occurring at the Masters could be missed.

To assist with this issue, an artificial intelligence (AI) system built by IBM assigns an equal equity excitement score to each golf shot captured in video. After being ranked, these golf shots are then candidates for a “round in three minutes” highlight video for each player.

The agile AI architecture uses deep learning techniques to measure the excitement levels of sound, visual content, and motion within each video clip. The multimedia excitement signatures are enriched with 34 situational features such as ball location, shot length, and hole yardage to provide an overall contextual excitement level. In the background, the solution will be monitoring unintended bias to ensure that the results are fair and accurate with respect to overall excitement levels. Patrons around the world will have the opportunity to see automatically curated highlight packages of their favorite players during the tournament.

The overall data flow of a raw video to a 3-minute highlight package follows a workflow pattern that is built on top of IBM Cloudant. Initially, CBS, a television broadcasting company, captures and uploads video of every shot in the 2019 Masters Tournament to a network-attached storage (NAS) device. A Go function then monitors the storage share for new files in MP4 and MPX formats. The Go function uploads the content from the NAS device to a content delivery network (CDN) while creating a pointer record in Cloudant to the video.

The Cloudant record from the previous step is enriched three times with additional data using Cloudant’s update handler features. In each step, Cloudant views are used to place records in processing queues. Records are picked from these queues and enriched with metadata.

In the first step, the record is enriched with 34 golf, biographic, and statistical features that sources from a DB2 on Cloud database. Each of the enriched records is placed into an emotion queue within Cloudant for highlight ranking. Next, a scene excitement ranking system pulls each new row from the emotion Cloudant queue and creates component-level highlight measures using artificial intelligence. The highlight results are written back to the Cloudant record with an update handler and placed onto a queue for bias processing. A proof-of-concept (PoC) system, which is written in Python, pulls any new record and calls Watson Machine Learning for an overall context highlight score. In the process, IBM Watson OpenScale continuously learns and debiases scores based on selected attributes such as hole number and crowd cheer excitement. The context and debiased context highlight scores are also stored in Cloudant.

As a player’s round finishes, a script aggregates the most exciting shots to create a 3-minute highlight package. This package takes into account business rules, video creative, television graphics, and branding requirements. The finished highlight package is uploaded to a content management system (CMS) for distribution around the world.

Overview of process

The AI Highlights system uses several deep learning and machine learning techniques to determine the excitement level of a video. Each video is split into its video and sound components. The sound is converted into an MP3 format and placed on a disk store. A Python process picks up the MP3 and sends the content into a convolutional neural network (CNN) called SoundNet with the PyTorch library. The last layer of the CNN is removed to retrieve the spatial representation of the sound. The feature vector is input into a support vector machine (SVM) that was trained on the domain of golf. Two SVMs are applied to produce a crowd cheer and commentator speech excitement score. The score is further scaled to compensate for video sound changes year over year at the Masters.

The visual aspects of the video are analyzed from extracted video frames. Each image is sent into the VGG-16 model within the Caffe framework that was pre-trained on ImageNet. The VGG-16 was adapted to recognize exciting golf gestures. An action excitement score is scaled to provide a score for a golfer. The same set of images is used to determine facial expressions of a golfer as well as body part detection. Portions of the body such as the head and torso are tracked to determine the speed of motion. The combination of body part motion and facial expression detection provides an overall subject excitement.

Each of the individual scores from cheer, action, body, and facial analysis are fused together into an overall multimedia excitement score. Each excitement score is saved into the Cloudant data store for downstream processing.

Process description

Another Python application deployed as a Cloud Foundry application on the IBM Cloud polls Cloudant for records to remove unintended bias and alter unfair excitement levels. Several artificial intelligence technologies within Watson OpenScale detect fairness and correct the overall context excitement level with mitigation techniques while monitoring model accuracy. The 34 context features are combined with the multimedia scores to measure and remove bias. For example, the excitement level from the sound of cheering might be biased because fan favorite players tend to have larger crowds than lesser-known players. As a result, the cheering sound predicts a popular player will have a more exciting shot than a lesser-known player because the cheer is louder. The Python application provides an overall context excitement score by calling a trained SVM that was deployed on Watson Machine Learning. Each scoring payload is sent to Watson OpenScale for continual bias and accuracy measures. Watson OpenScale trains a postprocess debias model that removes bias from the score given a set of monitored attributes.

An ontology of golf context attributes was defined to enumerate protected attributes that can lead to both an overall and a debiased context score. During the Masters, the tournament state such as hole number, player rank, and stroke number of the player that is hitting the shot within the video under analysis provides excitement predictors. More detailed information about each hole that includes par number and the yardage of a hole are included into the contextual excitement score. Additional attributes that summarize the outcome of a shot such as if the shot hit the fairway, landed in the water, or was a sand save provide player performance indicators. Other traditional protected attributes are considered such as player ethnicity or age that partition people into groups that have parity in terms of benefit.

Golf context

Throughout tournament play, Watson OpenScale will monitor the bias of context scores based on two selected attributes: cheer excitement score and hole number. We want to ensure that the highlight package includes players that have large and small crowds as well as holes outside of the Amen Corner, 16, and 18. Watson Machine Learning provides an overall context excitement score that ranges from 0, the least exciting, to 4, the most exciting. The reference group for crowd score is selected to be [0.3,0.6], where 0 means there is no crowd noise and 1 is the most. We thought that the monitored groups of [0,0.29] and [0.61,1] would either be biased for or against crowd size. As such, the bias found by Watson OpenScale will slightly change the output of the overall context score so the biased score decreases. In the following image, a new post processor model decreases the overall crowd noise bias by 43%.

Generally, the most popular holes at the Masters include the Amen Corner (holes 11, 12, 13), 16, and 18. We wanted to ensure that any other unprivileged hole has equal excitement equity during shots. As a result, Watson OpenScale created a post processor model to have an improved disparate impact score based on the hole number. The slightly adjusted debiased score will not compromise accuracy.

Golf model deployment transactions

After a golf scene has a ranked excitement, we can answer the question of “why?” through attribute contribution transparency. For a highlight that was given a low excitement ranking, we can determine the components that supported or refuted the algorithms decision. In the following image, you can see that the low gesture and speaker sound score supports the low ranking. The shot outcome of it not being a bogey or landing on the green negatively affected the excitement level. However, the average crowd noise and the high number of years this particular golfer has been a professional supports a higher excitement level. Because this shot was not a putt within the context of the other factors, the algorithm has evidence to support a higher excitement score.

Confidence level

In another explainability example, IBM OpenScale was 99.49% confident in the highest excitement score of 4.0. The leading confidence contribution for the 4.0 score was that the shot was a long putt on the green. The next two highest predictors were a crowd score of 1.0 and gesture score of 0.96, both out of 1.0. Most notably, this shot was on a hole where the golfer scored an eagle or better. Confidence against the 4.0 score was accrued because this shot was not the last shot of hole and it was not from the fairway.

Confidence level

Each of the highlights will be available for retrieval based on query terms and filters. The highest scoring highlights about a player on a specific hole can be reviewed. To go a step further, we are using a Python script using the Python Image Library (PIL) and real-time scoring data to generate broadcast quality graphics. The AI selected clips are then composited together using the MLT multimedia framework to include broadcast graphics, pre-rendered player introductions, and title cards. After the highlight package is uploaded to the CDN, our CMS system is notified of the content. Now, the transcendent moments are ready for distribution around the world.

The AI-curated highlights and round in three minutes will tell the story of the 2019 Masters Tournament by analyzing the entire tournament. Now you can follow your favorite players without missing any of the best moments.