This year IBM has partnered with The Recording Academy, the organization that conducts the GRAMMY awards, to demonstrate how AI capabilities can help their organization on the biggest night in music. For the Recording Academy, this is their largest and most complex project. To help The Recording Academy with the GRAMMY Awards, IBM sought to help categorize and enrich their content to identify the best pieces of digital content for use within their ecosystem.
In this blog post, we will describe how the IBM team from IBM iX (Interactive Experience) used IBM AI from the IBM’s cloud-based Watson Media offerings, to enrich the event’s photo curation process and video content from the GRAMMY Award’s red carpet in real-time across TV, social media, and digital platforms. By designing and implementing a real-time Photo Enrichment System to process a large volume of images from the event in New York, we will expedite editorial workflows and streamline production processes. Further, we will discuss our world-class user experience.
Every picture tells a story
During the night of the GRAMMYs, there is over 5-1/2 hours of Red Carpet live coverage, during that time more than 100,000 images that are captured and uploaded. Prior to 2018, the Recording Academy used 3rd party photography services that aggregated the images, split the photos into multiple views, and batch uploaded them to a central repository. Next, the Recording Academy content team examined all of the images to determine if they are suitable to be used on the GRAMMY’s site. The process was extremely time consuming and delayed the Recording Academy’s ability to create timely experiences for fans.
From an artistic and user experience perspective, our AI system comprehends and enriches each 2018 GRAMMY Award photo with an interpretation to tell a story. Our AI system uses computer vision, image analysis, fashion analysis, and deep facial recognition to provide contextual awareness for each photo in real-time. Additionally, AI computing techniques will allow us to create experiences that get the fans get closer to the artists.
The IBM iX team designed an AI enhanced image comprehension solution that interprets who is in the photo, what they are wearing, and if they are on the red carpet. We built the integration framework using the IBM Cloud to connect several AI components together on a Liberty Server with a CouchDB data store.
The diagram below shows the high level components developed for the system.
The solution has an on premise app running in New York that watches an existing drop folder for new GRAMMY Award images. The app, written in go, sends the images to a raw bucket on IBM Cloud Object Storage (COS). Once the image is uploaded to IBM COS, the app sends a POST request to our workflow system. The manifest from the request tells the Workflow System where it can retrieve the image for processing.
The ingest process pulls in the metadata about each image and creates a new image asset document within CouchDB. The database view updates to show a newly arrived and unprocessed asset. We use a Java servlet to startup a scheduler called a managed executor. Every 30 seconds, the Workflow Manager executes a process to identify new assets for processing.
The Workflow Manager retrieves assets from CouchDB, hosted by the IBM Cloud, and sends the image to the correct AI pipeline. Each step within the AI pipeline discovers new features about the image, which are translated into metadata. The enriched photo with the metadata is updated within our database.
After each AI computing component has been applied to all images, we rank the images based on an image quality fusion score. A Representation state transfer (REST) Application Programming Interface (API) provides an image retrieval system so that users can request the most relevant results through the Photo UI with natural language such as “Find me all images of Adele”.
AI Image Enrichment
We use several AI algorithms within an analytical pipeline for precise identification, broad identification, fashion analysis, and red carpet determination within an individual photograph. First, each image is simultaneously sent to two facial recognition algorithms. Leveraging an experimental API from IBM Research, a deep learning model projects the image through a convolutional neural network to measure how similar it is to other pictures of GRAMMY musical artists. The second facial recognition algorithm searches a gallery of over 40,000 celebrities for broad coverage. The combination of the facial recognition algorithms provides a precise and broad measure of identification.
Next, the image is sent to a fashion analysis algorithm for the person identified within the photo. The service determines color dominance and pixel features based on clothing. Similar matches of outfit style and score are returned to the pipeline. Another important aspect we measure is if the person in the image is on the red carpet. A Watson classifier has been trained to give us a confidence level that the GRAMMY’s famous red carpet is present or not. Finally, the image blur and contrast is measured to provide a photograph professionalism score. The features that are discovered within the pipeline are fused together for an overall image quality and to create rich user experiences.
The AI service pipeline includes:
1. Deep Face Recognition – Using an experimental deep learning system based on Caffe, we trained Watson on all of the nominees and key music celebrities. When processing photos, the algorithm identifies each person, location, and key attributes such as legs, arms, eyes, and etc. within the image. The system is distributed over 4 GPU based machines and 40 docker containers.
2. Watson Visual Recognition – Complementary with the Deep Face Recognition, we used Visual Recognition to give us broad coverage to identify celebrities. We use the algorithm to include surprise guests such as LeBron James. Watson was able to fill in any gaps within our precise oriented Deep Face Recognition.
3. Red Carpet Analysis – We used a Watson Visual Recognition custom classifier to determine if a photo was taken at the classic red carpet location. The red carpet feature was a good predictor of a high quality image.
4. Fashion Recognition – If the image meets our requirements as a single individual that is on the red carpet, then we send the image to our Fashion Recognition service to identify the style and dominant colors being worn. The fashion analysis enables fans to find trends in style across other people that were or are on the red carpet.
We take an aggregation of AI evidence from all the enrichment sources to determine an image score. The score represents the image quality so that feature rich photographs are uncovered from the large volume of data. For example, a non-blurry photo where an individual is well centered in the image with a good visual of his or her complete face would rank higher than a profile-based image. The evidence based image score provides a sorting mechanism for The Recording Academy team to quickly get to the best images taken at The GRAMMY Awards.
User Centered Experience
The majority of the global audience for the GRAMMY Awards is remote and has high demand for digital content. They can access digital services and experiences that we developed, which consume the AI algorithms and extracted information from each image throughout the show.
The GRAMMY Awards is a celebration of art that includes music and fashion. To create an experience around fashion, we developed a GRAMMY Fashion visualization that allows users to explore similar clothing trends for their favorite artist. The retrieved results show artists on the red carpet that have the same clothing styles. Fans can get their own fashion ideas by mixing and matching different clothing from similar trends.
Users can journey back through years of time to investigate lyrical analysis of artists at the GRAMMY Awards. We analyzed previous song lyrics for each GRAMMY nominee to discover emotional tone such as joy, disgust, anger, fear, and sadness. A summarization score for each emotion dimension is available for the album. Audience members and remote attendees can now understand how their favorite artist’s musical composition has changed through time.
Throughout the event, live streaming video is available as a service through the Internet. IBM Watson Media will generate automated highlights of unforgettable moments on the red carpet. Within the video, we recognize the identity of the artist with the deep facial recognition algorithm. Next, we transcribe the speech to text to ensure the subject whom we identified is mentioned during the conversation. Finally, we apply Optical Character Recognition (OCR) to each video frame to discover the name of the person that is usually projected on the lower right corner of the video. After our system has enough confidence that we have recognized the person in the video, we create a highlight clip.
Tracking the head of the artist by reversing the video stream marks the highlight clip start point. When the artist is no longer in the reversed video, we mark the start point. Similarly, we continue monitoring the live feed of the video until the artist face is not longer within the video. When this event occurs, we mark the end point. Splicing out the video between the start and end points creates the red carpet unforgettable. The Recording Academy video production team can explore the video stream by navigating through highlight clips on our GRAMMY Live Video Experience.
A World Class Experience is Delivered
AI solutions come to life when they help people make better decisions, turn information into insight, and create lasting user experiences. The goal of the AI-Enriched Photography System is to provide the Recording Academy team a tool to help them work through their data avalanche of images. As a result, we relieved the data sorting pressure so that they could focus on their core mission of delivering the best digital content. The Fashion Recognition, Song Lyric Analysis, and GRAMMY Live Video produce deep intuitive understanding about each GRAMMY nominee in real-time. The combination of AI and innovation produced a world class GRAMMY Awards experience for The Recording Academy’s global audience.
Go to GRAMMY.com/Watson to enjoy the Fashion and Lyric analysis.
The IBM delivery team includes: Brian Bacheller, Aaron Baughman, Leonard Flournoy, Micah Forster, Sean Goss, Gary Guerino, Stephen Hammer, Tony Johnson, David Provan, Corey Shelton, and Ryan Whitman.
The IBM research team includes: Ayushi Dalmia, Quanfu Fan, Rogerio Feris, Michele Merler, Nalini Ratha, Vikas Raykar, Chiao-Fe Shu, and John R Smith.
The IBM hosting team includes: Michael Choong, Chris Kalamaras, and Dick Locke.