For two weeks in the summer, the center of the sporting world is the US Open. Millions of tennis fans follow along hoping to feel close to the matches, wherever they are in the world. They want scores, they want player and tournament information and they want highlights. Capturing all the action at the event—with matches on up to 17 courts simultaneously, and up to six matches per court per day—is a monumental video production effort. This year, one of the innovations IBM is bringing to the US Open to aid in that effort is Cognitive Highlights, a solution that ensures fans can see all the tournament’s best moments.
So far at this year’s tournament, the Cognitive Highlights system has clipped over 25,000 points from over 300 hours of coverage of men and women’s singles matches, often happening simultaneously. The system uses deep learning models and “self-supervised” active learning techniques to recognize which of these points are significant and understand what makes a good highlight. Using Watson APIs, it understands the importance of certain scores—like a point that clinches a set—and uses visual and audio cues to create ratings for each point. The output is a one-of-a-kind system for ranking exciting moments and auto-curating a highlights package.
For visual cues, technicians trained Watson using Visual Recognition to identify when a player is performing actions that typically mark an exciting moment in the match—celebrating, waving to the crowd, fist pumps, etc. The system also uses facial recognition to read the emotional reactions of the players.
Visual Recognition also helps Watson divide each match into individual points, as it reads the camera placement and zoom which create the scene at the start of each point and knows that’s the place to begin the clip.
For sound classification, the team worked with MIT to develop a deep neural network called “SoundNet” for environmental sound analysis like crowd noise. Refining this system to adapt to the various environments (different crowd sizes and makeups) helps monitor crowd excitement over the course of a match.
Finally, the system has a sophisticated analytic model that analyzes the statistics that correlate with important moments in a match. Not all winners are equal in impact and the model helps keep focus on true turning points and defining moments.
The data from the visual, audio and statistical cues are combined and clips are scored on Overall Excitement. These clips are then used by the USTA’s editorial team, who have been spared the burden of watching hours of simultaneous video streams, and can focus on getting the right story told. Highlight packages are made quickly, tailored around specific players and sent out to the world of expectant fans through app notifications, social media, on player bios pages on the US Open digital platforms and elsewhere.
There’s a lot more we can do with Watson Media platform. Wherever there is video being recorded, context can be understood. These cognitive solutions can help monitor, categorize and derive meaning and insight quickly. There are potential applications of this solution for broadcasters, media, entertainment, security, education, industry, retail—and more. As the use of video continues to increase across all sectors, so will the applications of this technology, which inspires us to keep refining and making these solutions more powerful and more helpful to the people that depend on them.