Article

Trusted AI-generated content at the 2022 Championships

A focus on trustworthy AI brings explainability, transparency, and accuracy enhancements to your Wimbledon experience

By

Aaron Baughman,

Sara Perelman,

Eris Calhoun,

Nick Wilkin

During this year’s Wimbledon Championships, tennis fans around the world will be able to experience AI-generated content across the Wimbledon digital platforms, enabling fans to dive deeper into the contextual understanding behind tennis players and their upcoming matches.

Using artificial intelligence (AI) from IBM Watson, IBM can formulate a tennis player’s Power Index, or relative performance indicator. In addition, the AI models will predict each player’s likelihood to win for all singles matches during the Championships. This information is broken down into human-centered content across the Wimbledon digital platforms, exposing to fans the factors behind a player’s prediction and how the model arrives to these conclusions. The accuracy of these predictions includes uncertainty quantification, which, in turn, increases the likelihood that tennis fans will accept both the reasoning and prediction about the given match. Additional information in the form of sentences simplify and provide rationale around a player’s Power Index value and the likelihood to win prediction.

For the 2022 Championships, with a focus on trustworthy AI, IBM is enhancing the innovative technology behind the “IBM Power Index” (IPI) and “Match Insights with Watson.”

  • The “IBM Power Index” is an AI-powered analysis of player performance. The Tennis Tour ranking systems use 52 weeks of historical data to quantify player performance. To complement these, the Power Index focuses on a player's most recent history, combining advanced statistical analysis, the natural language processing of IBM Watson, and the power of IBM Cloud to analyze daily performance data, mine media commentary, measure player momentum, and direct the attention of fans to the most compelling matchups.

  • “Match Insights with Watson” are AI-generated fact sheets that help fans quickly get up to speed on every singles match at Wimbledon. The algorithms use advanced AI and IBM Cloud to mine the most recent player statistics and media commentary for insight, breaking down the individual elements of the IBM Power Index, sharing relevant quotes from various media sources, and constructing a natural language summary of key performance metrics.

We are also unveiling “Win Factors” that follow the same trustworthy AI principle. This new feature focuses a tennis match’s prediction on transparency, explainability, and accuracy by showcasing the top reasons why a player is predicted to win their upcoming match. The unification of “how does a model work” and “why is a player predicted to win” into a singular experience brings a new level of understanding to the game of tennis.

Trustworthy AI in tennis

Artificial intelligence systems have become increasingly complex and fundamental to everyday human decision making. In tennis, fans make decisions about which matches to watch, what players to follow, who is the next Grand Slam champion, where they should direct their eyes, which platform they should engage on, and where they should spend monetary capital. When fans can understand the reasoning behind the IBM Power Index and Likelihood to Win, they are more inclined to trust it. To trust a match prediction, fans need to know it is fair, reliable, transparent, explainable, and human-centered around the domain of tennis.

Figure 1. AI pillars of trust at Wimbledon
Image showing AI pillars of trust

At Wimbledon, we focused on five AI pillars of trust:

  1. Human-Centered Understandability: Each prediction and insight matches the terminology and vernacular of tennis. The domain match brings familiarity and provides value to tennis fans.
  2. Fairness: Tennis predictions should be fair and unbiased irrespective of biographical information and order of play. Within the IBM Power Index, we introduced a debiasing term that will not penalize idle players for not competing yet in the round.
  3. Explainability: Predictions must be understandable with supporting and refuting evidence. Natural language generated (NLG) sentences and insights are produced to bring contextual awareness about players.
  4. Transparency: The black box predictive model about tennis match winners should be elucidated to increase fans’ awareness. The IBM Win Factors show how the AI model works based on predictors.
  5. Uncertainty Quantification: The accuracy of tennis models must be reliable and trusted. The predictions from the Likelihood to Win model are related to uncertainty intervals.

User-focused AI anticipates the what, why, and how questions by providing trustworthy player and match experiences. Techniques within the IBM Uncertainty Quantification 360 Software Developer Kit (SDK) increased the reliability of the model’s predictions. Blackbox Metamodel Regression added an uncertainty prediction to our core match winner model. The range between the upper and lower bounds of a prediction was used to adjust each player’s likelihood to win confidence levels. The core prediction model was trained with IBM AutoAI and deployed on IBM Watson Machine Learning.

Next, we exposed how the prediction model worked. The IBM AutoAI match prediction model was exported into embeddable code. The Python sklearn pipeline was input as a parameter into the IBM Explainability 360 SDK that used both SHAP and LIME algorithms to rank predictor importance. An important debias routine was added to the IBM Power Index that did not penalize a tennis player’s value if they had not played yet in a round. The IBM Power Index score was a predictor within the tennis match predictor model. This increased the fairness of both the IBM Power Index and the Likelihood to Win system.

Evidence was accumulated around each insight and prediction to show why our model predicted certain models and why each player was assigned a Power Index value. We used Watson Discovery, IBM Natural Language Understanding, IBM Natural Language Processing core, T5 transformer models, and natural language generation techniques to create understandable context. For example, the Win Factors showed performance and statistics under each of the predictors. Further, Match Insights with Watson provides factoids about players to enhance fans’ understanding about each tennis player going into a match.

Figure 2. User-focused AI
Image showing how, what, why

Trustworthy tennis predictions

Each ladies' and gentlemen’s upcoming head-to-head match are provided on the digital platforms with a predicted outcome. If the players have played previously, a predictive model with 19 semi-independent variables provides a prediction. The variables include each player’s Power Index value, surface performance, ratio of games won, age, quality of recent wins, and media sentiment. Players who have never played against each other use a predictive model with 14 semi-independent variables. The probabilities of winning takes into account uncertainty quantification to increase match prediction accuracy levels. Each of the win probabilities is visualized through a doughnut plot with Wimbledon branded colors.

When tennis fans tap the “Why?” button, they are able to see the Win Factors for the predicted winner. Each Win Factor is ranked by the most important predictive variable. A sentence that is paired with each of the top three predictors shows the evidence that supports the outcome. At the bottom of the screen, the IBM Power Index depicts the overall player rank, which is another explanatory variable.

In the next tab, the “In The Media” section provides additional punditry context about each player. This provides users with tennis context that surfaces additional facts and influences the likelihood to win. The “At A Glance” tab provides a factually accurate set of sentences based on tennis statistics. Natural language generation techniques produce the information about both head-to-head players.

Figure 3. Predicted head-to-head match with Trustworthy AI (all content is test data only)
Predicted head-to-head-match

The predictions architecture is shown in Figure 4. A Red Hat OpenShift application for likelihood to win is called after each player is ranked by the Power Index system. The application pulls data from SportsRadar in the form of historical information as well as from IBM Db2. The IBM Db2 database stores daily IBM Power Index results such as forecasted media punditry and volume, surface play quality, and recent play quality. Feature vectors are created for each match and posted to IBM Watson Machine Learning that is running both head-to-head predictive models. The models were trained by IBM AutoAI, which ran hyperparameter optimization and synthesized new features to find the best performant model based on accuracy.

Figure 4. Tennis player prediction architecture
Player prediction architecture

After the likelihood to win application received prediction results, the data was sent to IBM AI Explainability 360 and Uncertainty Quantification 360 SDKs. Both of the SDKs were embedded within a serverless application called Trustworthy AI that ran on IBM Code Engine. First, SHAP was run with an embedded predicted model that was exported from IBM Watson Machine Learning. Shapley values were used to rank and quantify the predictive importance relative to the winner. Next, the Blackbox Meta Model Regression was trained and run for each prediction. This provided an uncertainty interval that was used to adjust the prediction confidence.

The results of the prediction and explainability system were returned back to the likelihood to win application. Next, the payload was posted to a Red Hat OpenShift application called Win Factors NLG. In this application, the explanatory variables and ranks from SHAP were translated into enriched sentences. Evidence was pulled from SportsRadar and IBM Db2 to construct formatted sentences. The sentences were created from rules and subsequently paraphrased by a Hugging Face T5 transformer. The results of the Win Factors NLG application were returned to the likelihood to win application and saved within an IBM Cloudant document.

An AI Command and Control Center supports human reviews. The React-based application enables humans and AI to work together to edit, approve, or reject content before publication into the Wimbledon ecosystem. The Win Factors are sorted by predictive power so that each human reviewer can view the most important pieces of evidence first. When the publish button is tapped, the content is saved to IBM Cloudant in preparation for a final publisher IBM Code Engine application.

At the same time, the IBM Power Index data is being produced by another system.

How does the IBM Power Index work

The IBM Power Index is the measure of a player's strength going into and throughout a tournament. The factors that contribute to IPI provide strong indicators as to who will win a head-to-head match. The IPI is complementary to the traditional tour rank. Over a tennis season, a player's ATP or WTA ranking is based on the number of points accrued over 19 different tournaments within a 52-week rolling window. There is more overlap between the tour rank and IPI leading up to a Grand Slam tournament than during. The points that make up the ranking are dropped. As its foundation, the IPI uses relevant industry punditry observed through thousands of news sources combined with player performance to create an index of a player's momentum. During a Grand Slam tournament, the crowd becomes more focused on a player with precise language. This helps both standings to move toward independence.

Each day, the IBM Power Index is updated and available on a Leaderboard to track the road to the Championship. Figure 5 shows the experience on a mobile device.

Figure 5. The IBM Power Index Leaderboard mobile experience (all content is test data only)
Leaderboard

Here is how the IPI works. Over 25 factors contribute to the IPI. Within the player performance dimension, a player's win velocity, overall win ratio, and projected future win ratio account for win power. Next, the quality of a win, rank difference, injury status, tournament participation boost, round progression award, and win margin boost award players for meaningful play. Within natural language, the crowd's opinion about a player's performance and health is a large factor within the IPI. Both content sentiment and normalized volume are forecasted forward a few days to provide leading indicators for the IPI. At the same time, the overall assessment of the player adapts to the current Grand Slam tournament with a refocus metric. This enables the IPI to rapidly adapt to current play outcomes.

The IPI becomes an insight with the application of a predictive model called "Likelihood to Win." The model has 30 features that include comparative elements of the IPI. A head-to-head singles match is assessed by the model. A win probability is assigned to each player. The win probability can shift day by day as the data around punditry and performance changes. Figure 6 shows the overall architecture of the IBM Power Index system.

Figure 6. Power Index System
Power Index System overview

The core IPI system runs over IBM Functions, a serverless technology that can run code bootstrapped by containerized technologies. A series of triggers runs action codes on predefined schedules. The long-running ranking action calls itself as it processes players. Statistical data is pulled from SportRadar while punditry is queried through IBM Watson Discovery. The functions code calls Red Hat OpenShift RESTful services that apply natural language processing techniques to the text. The volume and sentiment trends of the queried data are forecasted a few days into the future by IBM Watson Core OneNLP. A spike forecaster that was trained by IBM AutoAI and deployed on IBM Watson Machine Learning helps to discover anomalous future situations. The results of the data are stored within Db2.

At the end of each player's IPI process, a feature vector is posted to a likelihood to win Python application running on Red Hat OpenShift. The feature vector is normalized with missing values imputed before being posted to the likelihood to win predictive model. The resulting probability of a win for two players within a match is saved to Db2. A Cognos Dashboard pulls data from Db2 into data visualizations. In parallel, IBM Code Engine aggregates Likelihood to Win and the IPI data together into a JSON file for upload into an IBM Cloud Object Storage. The IBM Code Engine publisher creates data that feeds into the myWimbledon experience.

Experience the IBM Match Insights with Watson

While our system creates the IBM Power Index, the Match Insights system is applying natural language processing, AI, and statistical analysis to tennis-related content. The IBM Power Index, Likelihood to Win, and Match Insights are joined together within a singular experience, as shown in Figure 7. The most meaningful insights provide transparency to both the upcoming tennis match and to the Power Index.

Figure 7. The IBM Match Insights with Watson experience (all content is test data only)
Match Insights experience

Understandable context: Match Insights with Watson “In the Media”

The “In the Media” section of the experience shown in Figure 3 supports the core principles of Trustworthy AI. The human-centered insights that are derived from hundreds of thousands of articles from thousands of sources provide context about each tennis player. The insights provide evidence and media punditry about both players that influence the likelihood to prediction and the IPI. This helps fans to understand and ultimately trust our AI insights. In fact, we started by asking ourselves key tennis questions.

We decided to focus on core media outlets to answer key questions. What makes a player interesting? What happened in their career to lead them to play at Wimbledon? Match Insights with Watson seeks to uncover the answer to these types of questions, along with any other facets of a player's background that makes them stand out from the field.

To achieve this, Watson searches for information on a given player across millions of news articles, blog posts, and other online media, supplemented by deep dives on a targeted selection of tennis sources, such as https://www.wimbledon.com. Watson has a deeper understanding of the editorial content through natural language processing enrichments. Articles are categorized by their prevalent topics or concepts, and relationships are drawn between entities such as people and places. Articles deemed relevant to both the player and the topic domain are then summarized using extractive algorithms. The nature of extracting sentences from a body of text means that a degree of context is lost in the process. Pronouns and any time-relative references such as "two years ago" might be disconnected from their roots, leaving the summarized sentence difficult to understand. To mitigate this, we attempt to resolve orphaned coreferences using sentences within +/- 2 of the extracted summary.

Having collected relevant articles, extracted salient information, and resolved any lingering coreferences, the next stage is to assess each of the sentence's quality. Two dimensions are used to determine the quality of a given snippet: its grammatical coherence, which is determined by scikit-learn surface form parse rules and decision trees, and a trained machine learning model that measures topic alignment. Sentences that pass a quality threshold are determined to be insightful and are stored in our IBM Cloudant natural language processing store as factoids. The factoids are stored as JSON documents by topic/player and are then sent through our Insights Human Review Tool. Here, human operators review and approve the stored factoids. Figure 8 depicts the architecture of the factoid system.

Figure 8. Tennis factoids architecture
Factoid architecture

Trustworthy context: Match Insights with Watson “At a Glance”

Fact-based evidence in the form of sentences derived from tennis statistics help fans to understand recent tennis player performance. The information compliments the Likelihood to Win predictions and the IPI to reinforce user trust. The robust sentences align the on-court action with fans’ desire to understand the what, why, and how around our predictions.

The on-court action at Wimbledon produces dozens of distinct statistics for fans and tennis experts to analyze. These statistics are particularly useful when previewing an upcoming matchup because they can indicate the relative strengths and tendencies of each player. Does this player hit many winners from her forehand? Does the player often approach the net? Statistics can answer these questions and many more, giving fans insight into the forthcoming match. A skilled analyst can study data tables and uncover the areas in which each player stands out. Match Insights brings this level of comparative analysis to statisticians and casual fans alike by instead presenting the data in natural language.

IBM maintains databases that store these statistics and other relevant information using the IBM Db2 on Cloud service. In their raw form, these stats are still difficult to interpret. Comparisons are difficult to make because matches can differ in length, from under 1 hour to over 4 hours. To normalize for this variability, IBM calculates per-point frequencies. Each frequency is then converted to a rank value with respect to that statistic among the entire tournament field of 128 competitors.

The most extreme values are the items that will be most interesting to the tennis audience. Additionally, Match Insights draws contrasts by highlighting the stats with the largest percentile differences between the two players in the matchup. After these key stats are selected, the system converts the stats to natural language. To do this, the system must understand the various components of a statistical highlight. These components include the subject phrase, verb phrase, and contextual phrase. As humans generate natural language using various word choices and syntactical ordering, the AI system also varies these elements to produce human-like language. The output structures and diction are then selected according to probability. At this level of variety, the natural language generation system, which is powered by open source natural language generation and IBM Research technologies, can produce hundreds of unique texts for each match's selected stats. Additional processing then confirms grammatical correctness such as pronoun, article, and verb agreement.

The final task of the Literature Generator web service is to persist the texts and corresponding metadata to an IBM Cloudant NoSQL database on IBM Cloud, which feeds the human review UI. After a Match Insights package receives approval, an IBM Code Engine application joins the statistics with corresponding factoids and writes a JSON document to a bucket on IBM Cloud Object Storage. The contents of this bucket are delivered on Wimbledon.com using the IBM Content Delivery Network. The Content Delivery Network is well equipped to serve the high traffic for these data files as they power the Match Insights features on Wimbledon.com and apps.

Figure 9. Natural language generation for tennis architecture
NLG for tennis architecture

Overall trust-focused system

Figure 10. Overall AI system
Overall AI system

Each content producer system such as Factiods, NLG from Stats, NLP Optimization, AI Highlights, and Power Index create deep and diverse types of information about gameplay. The content is stored in three independent IBM Cloudant NoSQL databases. At the same time, current game state such as players' statistics are streamed into an IBM Db2 database. All of the data is joined together by a Python publisher application. The application is containerized with Docker and run as a flask plus RESTful service. The image is pushed to IBM Cloud image registry and run on IBM Code Engine. A subscription-based service was created to schedule when the publisher API is called. The following code depicts the IBM Cloud CLI command.

ibmcloud ce sub ping create --name IPIpublisherscheduledev --destination IPIpublisherdev --data '{}' --schedule '*/10 * * * *' --path example

IBM Cloud Code Engine allows running various workloads in a serverless fashion – containers, batch jobs, apps, and functions. This allows developers to run the broadest possible set of workloads in a serverless fashion, and as a result gain the highest possible CapEx and OpEx savings, combined with a very high level of productivity due to not having to deal with IT infrastructure concerns.

After each run, player lists and IBM Power Index data is converted to JSON files and uploaded to IBM Cloud Object Storage. The IBM Cloud Object Storage bucket is fronted by a Content Delivery Network and consumed by Wimbledon experiences.

Enjoy our tennis predictions across the Wimbledon digital platforms

Automatic reasoning with uncertainty built on the pillars of trustworthy AI deeply engages tennis fans around the world. Each prediction is supported with context that promotes transparency, explainability, fairness, uncertainty quantification, and human-centered understandability. The IBM Power Index provides player-specific performance index relative to tennis peers. Player performance over time and key factors indicate reasons for a player’s IBM Power Index. The culmination of this trustworthy AI experience will help fans deeply engage with tennis players and matchups during the 2022 Wimbledon Championships.