Behind the code: Supporting evidence for Fantasy Football insights

ESPN and IBM have teamed up to bring a new level of insight to fantasy football team owners: Watson AI. #FantasyFootballFace. Watson is analyzing millions of documents, videos, and podcasts from thousands of trusted sources. The ESPN Fantasy Football with Watson system has been a significant undertaking with many components.

This article is the third in an eight-part series that takes you behind each component to show you how we used Watson to build a fair world-class AI solution.

The creation of supporting evidence for fantasy football insights

Multimedia and textual evidence solidifies trust for critical fantasy football predictions from ESPN Fantasy Football with Watson. The journey to open AI shows the means of highly complex algorithms to their outputs by answering the question of Why?. Users of the system can branch into deeper evidence to judge the veracity of the Watson system to build a long-term relationship focused on belief. Just as Sherlock is to Watson in the mystery series, fantasy football players can consult with their trusted advisor: ESPN Fantasy Football with Watson.

On each player card, the “Buzz” tab shows the most relevant links to sources that support or refute the predictions from Watson. News articles from 50,000 sources on the Internet along with ESPN videos and podcasts are analyzed with Natural Language Processing algorithms such as Watson Discovery and deep learning algorithms written in Python. The sounds from the multimedia sources are transcribed by the Watson Media Cloud and the Watson Speech to Text service to create textual information for machine reading. The most trusted sources are candidates for boosting to increase their influence during the input into document 2 vector neural networks. If a minimum amount of evidence is not available from boosted sources, an additional number of neutral sources are included. A source is trusted if the content matches our rules or if both the ESPN Football Editorial Staff and the IBM Fantasy Football Staff agree that a website has highly veracious data. Any blocklisted source such as a satire site is removed.


The news sources that are included as predictors are crawled for additional scrutiny. Relevance measures are calculated by the following rules:

  • If a player’s name is present in the title
  • The number of times a player’s name is within a document
  • The location within the article that has the first occurrence of a player’s name
  • The number of words between the player’s first and last names

An additional metric of article recency is applied to the evidence such that both the most relevant and timely data is readily available for users on the player card. Any article or text is filtered if the relevance score is below our veracity threshold. The threshold is very focused to maintain trust throughout the fantasy football season.


To achieve evidential insights, a Cloud Foundry Python application running in the IBM Cloud finds highly precise evidence. Queries that include the player’s full name, team, the words “NFL,” and “Fantasy Football” are issued against the News source in Watson Discovery. If the result set was smaller than the empirically determined 50 articles, the query is slowly converted into a recall-oriented format until only the player’s full name is in the query. The precision to recall spectrum is followed for the most veracious sites until the next tier of neutral sources is required to fulfill the minimum number of sources that will not compromise the accuracy of the overall machine learning pipeline.

Separate Cloud Foundry applications run for textual news, mp3 podcasts, and mp4 videos. The transcribed podcasts and videos are analyzed with Natural Language Processing algorithms from Watson Discovery where relevancy and timeliness are measured. The multimedia evidence headlines and resource URL are stored within a multimedia Db2 on Cloud schema. The unstructured news has a similar process as the multimedia. The full text of the article and transcribed sound is not stored within Character Large Object (CLOB) fields so that petabytes of disk space are not consumed.

A Cloud Foundry Node.js application queries both the text and the multimedia schemas to create Java Simple Object Notation (JSON) files. Each of the files for every player is produced asynchronously and uploaded to an Object Storage origin. The JSON resources do not have any cache headers for browsers or clients so that users’ requests go to the edge servers within the IBM Cloud Content Data Network (CDN). The JSON data is also set to be public readable for public clients. For example, in JavaScript, the Access Control List (ACL) is specified within a dictionary.

let params = {Bucket: this.containerName, Key: objectName, ACL: 'public-read', ContentType: 'text/json'};

On the command line, JSON files can also be updated with:

curl -X PUT "" -H "Authorization: Bearer <your token>" -H "x-amz-acl: public-read"

Although trust is hard to build and easy to destroy, Watson has earned and maintains trust for millions of fantasy football users. The evidence provides a level of transparency during reasoning under uncertainty. #WinWithWatson

Check back next time as I discuss the machine learning pipeline. To find out more, follow Aaron Baughman on Twitter: @BaughmanAaron.

The ESPN Fantasy Football logo is a trademark of ESPN, Inc. Used with permission of ESPN, Inc.