In 2017, nearly 60 million Americans played in a fantasy sports league. Most leagues require a cash buy-in, with the winners receiving cash prizes. During the NFL season, self-proclaimed experts crowd the web with analysis, predictions, and advice on how to manage one’s fantasy football team. This has become so widespread that FantasyPros, a major fantasy site, even ranks online fantasy football experts on the accuracy of their predictions.
These articles contain a treasure trove of information, but the volume is simply too large for a human to read, process, and weigh all of the analyst reports. But the stakes, in both money and bragging rights, are high. What if there were a way to actually read all these reports, aggregate them to predict NFL player performance, manage your fantasy team accordingly, and dominate your league?
It turns out Aaron Baughman, an IBM Data Scientist, did just that, using Watson’s predictions to go 13-0 and win his fantasy football league‘s regular season. Combining historical NFL data, current news articles, opinion pieces, and injury reports, Baughman and his team built Fantasy Insights with Watson, a system to predict the likelihood a football player would boom or bust in a given week, based on the online content about that player.
But how was Watson able to pull out the relevant, football-related information from millions of words? The language used in writing about fantasy football is far too specialized for generic entity extraction. Enter Watson Knowledge Studio, a cloud-based application that enables developers and domain experts to teach Watson to understand the language of a topic in unstructured text. In this case, Baughman and his team trained Watson on news articles about football, training a machine learning model to annotate information relevant to future player performance.
Watson Knowledge Studio is a unique application in the natural language processing space, as it requires no programming. Users must define entity types (categories of words) and relationships that show up in the text. To do this, the team built a type system of 13 entity types, related to the football content of the articles. Some of their entity types were Body_part, Player_status, Performance_metric, Coach, and Injury. It was not a comprehensive ontology of all football information; the team was looking to only extract information that would help predict player performance.
Two of the entity types were related to tone: one positive (phrases like “makes an impact,” “tier-one”), and one negative (“worry,” “on the bench”). The ultimate goal of the project was to produce a user experience that could display likelihood of success, so positive or negative analyses were important factors.
The team then annotated a sample corpus of articles about football players, marking instances of any of the entity types. For example, they would highlight “questionable” as the player status, and “immobile” as negative tone. In the example below, the passage has been annotated for mentions of the Player, negative_tone, and performance_metric entity types.
A machine learning model was trained from the annotated documents and deployed in Watson Discovery, enabling the team to have Watson read and understand millions of news articles within the context of fantasy football, while extracting the same entities and relationships that Baughman and his team annotated. They then correlated this information with historical statistics of NFL players, ultimately producing a likelihood that a player would boom or bust.
The WKS system was invaluable to the development of ESPN Fantasy Football with Watson. Data scientist Coleman Newell said, “The ease of use was a lot easier to train people to annotate than I thought it might be… the model training and evaluation was really easy to compare annotators and evaluate each re-training”.