Timeless AI insights at the GRAMMYs: Changing the fan experience

With contributions by Susanne Bull, Stephen Hammer, Tony Johnson, Sara Perelman, Corey Shelton, Tyler Sidell, and Matt Stark.

The 2020 awards season is upon us, and Hollywood is rolling out the red carpet for music’s biggest night. No matter the awards show, many fans tune in to watch the interviews of the nominees as they arrive for the event. And this year, the Recording Academy (RA) is working with IBM to go beyond questions like “Who are you wearing tonight?” to share deeper insights about all the GRAMMY nominated artists.

For the 62nd GRAMMY Awards, IBM® has partnered with the Recording Academy to bring fans in-depth, data-driven analysis during the traditional red carpet coverage on GRAMMY.com, where IBM Watson will serve up deep insights on the artists and their music.

This unique red carpet experience, “GRAMMY Insights with IBM Watson” will provide dedicated music fans with a deeper look into their favorite artists including insights like where and how they got their start, which venues they’ve performed at, and even early career connections with others artists. IBM Watson will analyze topics discussed on the red carpet in real time, pulling from millions of data points to surface insightful information, enabling a better, more engaging experience for the millions of red carpet livestream viewers and music fans around the world.

With IBM Watson™ Discovery and an IBM hybrid cloud architecture, we’ve transformed IBM Watson into the ultimate red carpet host. Tune into the webcast on January 26, at 5 PM ET to catch this livestream event, as well as all buzz-worthy content (on-demand videos and photos) across GRAMMY.com. In the meantime, we’d like to share the technology and architecture behind the tool.

Overall system architecture

We developed a system as described in Figure 1 that enables the Recording Academy to engage their fan base while also broadening their viewership through all-new, Watson-curated insights. The workflow we developed encourages deeper fan engagement through consumer interfaces while also scaling the expertise of the RA’s editorial team. To achieve both goals, the AI technology had to be capable of running anywhere on multiple clouds. The flexibility of running workloads over Red Hat OpenShift on IBM Cloud creates a data refinery for the creation of rich insights. As insights are viewed, fans are able to relate and connect to award nominees as they walk down the red carpet. These insights discover and reveal the human journey toward a nomination while magnifying the many unique facets of a select artist’s life.

The system architecture can be split into two phases. The first is the curation of insights before the GRAMMY Awards. A corpora of information was generated from over 100,000 news sites, Wikipedia, and grammy.com. Each article was summarized using Watson Discovery and Watson Natural Language Understanding. Extractive summaries were created based on the most probable sentences over six features:

  1. Unigram language model that measures if the information in a sentence covers the query
  2. Unigram language model that measures the mass a summary devotes to a query
  3. Summary that covers the document set
  4. Bigram language model entropy measure
  5. Bias sentences that occur earlier in an article
  6. Bias sentences that are longer

The summarization algorithm is unsupervised and does not require domain knowledge. The algorithm is fast, accurate, and can be used in any domain.

Next, the summaries are turned into factoids with natural language processing techniques. We attempt to resolve coreferences by substituting proper nouns for pronouns. The gender of a proper noun is determined by AI to help match the appropriate substitutions. Any vulgar words or inappropriate slang is removed from the factoid.

In the next step, the factoids are turned into insights through a machine learning pipeline. Deep neural networks built with IBM NeuNets classifies the insight as good or bad with a probability. If the insight is classified as good, IBM Natural Language Classifier then assigns a category. The category can be interesting, quote, record breaking, biographic, career stat, inspiration, artist connection, charity, breakthrough, nonmusical, or scandal. The category helps ensure that we have diversity within selected insights. All of the insights are orchestrated by Node.js application asynchronous calls and input into a Cloudant database.

Overall system architecture

Figure 1. The overall architecture of the Timeless AI Insights system

Human operators have the opportunity to review the highest-quality factoids about any relevant category. When they are approved, they are sent to an administration tool for publication. At this point in the publication flow, we are within the live broadcast. The human operator uses real-time speech to text on HTTP Live Streaming in the form m3u8 live streams to determine when to inject the insight into the live stream. In another insight access point, the insights are overlaid on top of Video on Demand (VoD), photos, and articles. These insights are stored within artist JSON files on the content distribution network (CDN) so that they can be inserted into an iframe. At the same time, the insight is pushed to video compositor and transcoder to burn the insight into the livestream. In the future, a scheduled job will invoke the insight pipeline for popular artists to get late breaking insights.

Insights AI pipeline

A distributed AI pipeline was developed to support the creation of insights at scale with high availability. Each of the functional components within Figure 2 is containerized with Docker technology and managed with Red Hat OpenShift on top of Kubernetes. For example, the summarization application was statically scaled to seven pods. This part of the system has the heaviest load because it supports both single- and multi-document summarization.

The Factoid Generator application manages traffic of the messages and data. The app gathers the artists list and generates query-based content from Wikipedia, grammy.com, and Watson Discovery News. The retrieved articles are summarized with single document extraction algorithms. In parallel, each artist is sent to the Factoid Expansion app that performs multi-document summarization. Each of the summaries is posted to the two Factoid Resolution pods to perform anaphora resolution and gender classification. To keep track of state, the results of multi-document summarization are stored in a Cloudant deep factoid document database. The Factoid Generation app periodically checks whether the multi-document work for a particular artist is finished.

After both the multi-document and single-document factoid processes are finished, the results are unioned together. Next, each of the candidate factoids is sent to five Factoid Machine Learning pods for quality assessment and categorization. With 88% accuracy, IBM NeuNets performed a deep neural network architecture search (DNNAS) to find the best network topology. Each of the textual words were encoded into a large feature vector by global vectors for word representations (GloVe) and input into the neural network for quality classification with probability using a Softmax function. If the candidate insight was classified as a good with a higher probability than a bad class, the text was sent to the IBM Natural Language Classifier service. The service uses an ensemble of techniques that includes support vector machines (SVMs) and convolutional neural networks (CNNs) to provide a categorization such as interesting, quote, record breaking, biographic, career stat, inspiration, connection, charity, breakthrough, nonmusical, and scandal. Training both sets of algorithms, IBM NeuNets and Natural Language Classifier required the support from five human annotators.

The process flow of creating insights

Figure 2. The process flow of creating insights

The human annotators went through 47,880 insights and labeled 9,500 of them as good or bad with a specific categorization. A human annotation guide was provided to each annotator that provided the rules for specific labels. The surface form of the insight must be of proper form, grammatically correct, proper punctuation, disambiguate pronouns, and time references must be absolute. The semantic form of the insight must be a substantiated claim or opinion, not controversial, free of profanity, and relevant to the artist. As each human annotator labeled on the first day, any questions and disagreements between the group were addressed to ensure coherence while teaching the algorithms. The training data was exported into CSV files and used by both IBM NeuNets and Natural Language Classifier to turn factoids into insights.

Insight orchestration

The Factoid Generation application brokers messages and data between Red Hat OpenShift pods and IBM public cloud services. The application is written in Node.js to take advantage of the single threaded asynchronous capability of the language. Traffic can be spread across multiple paths such as single- and multi-document summarization flows and joined when an artist workload is complete. The following code example shows how we use Node.js Promises to run parallel jobs.

// Accept an array of artist names as input
// Runs single doc and multi doc processes in parallel
async function main(i) {
    let processes = []
    processes.push(() => s_doc(i), () => m_doc(i))
    //exec to run in parallel, individual failures in components are acceptable
    let result = await Promise.allSettled(processes.map(run => run()))
    result = post_processing(result) //+ids and dupe removal
    ml_run_stats(result) // Get stats on projected ML run time
    let final = await ml_eval(result) // ML evalaute factoids
    return await store_factoids(final)
}
async function s_doc(input) => {
    try {
        let content = await sd_run(input) // Info gathering
        return await sd_summarize(content) // Info summarization
    } catch (e) {
        throw e
    }
}
async function m_doc(input) => {
    try {
        return await md_run(input) //Runs info gathering & summarization
    } catch (e) {
        throw e
    }
}

After the results are combined, we post the results to remove duplicates and to add factoid identifiers. Next, the machine learning pipeline is run to assess the quality and category of the insight. Finally, the result is stored within a Cloudant document-based database. From here, a Python application pulls the completed insights and constructs JSON files about each artist that is uploaded to the CDN.

Computing at scale

From a consumer perspective, over tens of millions of music fans will be live streaming the webcast. The webcast is wrapped around an iframe that juxtaposes the insights with the video content. To protect our origin servers, we upload insights about each artist into the IBM CDN. JavaScript is loaded by the iframe, where it receives an artist’s identifier from the parent frame. The artist identifier is used to look up the appropriate JSON file from edge servers around the world. From the JSON file, the JavaScript methods pick which insights to show on top of backstage media types such as photos, articles, and VoDs.

To process over 18 million documents into 150,000 factoids that were further refined into 1,068 insights for 900 nominees, we needed a flexible and compute-intensive platform. The combination of Docker containers and the Red Hat OpenShift platform allowed us to scale our compute and memory for our batch processing. Our Red Hat OpenShift cluster has 6 worker nodes with 4 vCPUs, 16 GB RAM, 25 GB SSD primary disk, 100 GB SSD secondary disk, and 1 Gbps network speed. At peak state, when we were processing over 18 million documents, we used 85% of CPU and 95% of memory. Each of the containers was scaled to the maximum pod setting to prevent any delay during potential scale-outs. The entire AI pipeline was supported by 6 applications with 6 Docker images and 17 pods. The configured clustered allowed us to process all 900 nominees within 10 hours.

Development of the pipeline took place in Visual Studio Code. When code was ready, changes would be committed and pushed into GitLab. Deployment runners were executed to build the Docker images on the GitLab machine and pushed to the Red Hat OpenShift image repository. When an application detected an image change, the new image was rolled out to all application-specific pods. Each of the applications were exposed with routes so that their functions were available as Representational State Transfer (REST) services. The rollout of changes follows the canary deployment pattern so that current workloads are not disrupted.

Development flow of deployment to Red Hat OpenShift

Figure 3. Development flow of deployment to Red Hat OpenShift

User experiences

To manage the 150,000 factoids, we built a Factoid Workflow user experience. Designated reviewers log in to the tool and select an artist tile. The artist tiles can be filtered based on name, number of factoids, and popularity. The artist factoids are then locked so that multiple users do not have change conflicts when reviewing. Any factoids that are within the “Waiting For Review” pane must be either accepted or rejected as a candidate. The factoid text, source such as Wikipedia, machine learning quality, and category are available to assist the reviewer. The “Show Factoid Filter” can be used to search on keywords, machine learning quality thresholds, negation words, source, and so on.

The insights review user experience

Figure 4. The insights review user experience

After a factoid has been promoted to the candidate pane, an independent reviewer must approve, reject, or demote the factoid. In this view, the context surrounding the factoid can be viewed to determine whether slight edits are required to disambiguate any text. If edits are required, a flag can be selected to notify downstream editors to review the text. The approved and rejected factoid panes show the end state of candidate factoids. The approved factoids are insights that can be pushed to JSON files during the live show. A star indicator denotes outstanding factoids that should be prioritized for inclusion in the artist JSON files.

Figure 5 shows the overlay of Watson Insights on top of multimedia content during the event. We provide insights on all backstage media types such as VoD, articles, and photos. For example, we place insights on top of the images.

Insights overlaid on backstage multimedia content

Figure 5. Insights overlaid on backstage multimedia content

During the livestream, we have an administration tool that burns the insights into the stream. The insight is composited and transcoded into the m3u8 stream for public consumption.

The live factoid approval tool for live streaming

Figure 6. The live factoid approval tool for live streaming

It’s about music

The GRAMMYs aren’t just about the awards themselves. The show is about music and the artists who make it. We built this unique IBM Watson integration to take music fans deeper than ever before, giving them an opportunity to discover new things about their idols during music’s biggest night.

Tune in to GRAMMY.com on or after January 26 to experience GRAMMY Insights with IBM Watson, an experience that will bring you closer to your favorite artists at the 2020 GRAMMY Awards.