Watson’s artificial intelligence helps millions of ESPN fantasy football owners make better decisions
Watson’s artificial intelligence helps millions of ESPN fantasy football owners make better decisions

Sports enthusiasts from around the world join millions of other fans to act as NFL football team managers on the ESPN Fantasy Football platform.  Every week, team managers select their starting roster to compete against an opponent within their league.  The objective is simple: select players that have the highest likelihood of scoring the most fantasy points by accruing touchdowns, yards, receptions, etc.  The avalanche of options supported by the vast amount of available data is at once intriguing and daunting. Of course there are typical scores and stats — ‘structured’ data that fits into a spreadsheet, but what about all the unstructured data? Millions of words printed daily in articles and blog posts? These articles contain valuable information but tracking them all on a daily or even weekly basis requires more time than a 24 hour day allows.

Enter IBM Watson. We have trained Watson with the ability to read, comprehend, and interpret millions of news articles and social content. Watson ‘read’ over 6 million articles just to learn the game of football and during the season, it has been reading updated information from over 3,000 sources every hour. The system is built by IBM iX on the IBM Cloud and is designed for scalability, availability, and flexibility.

The insights generated from Watson’s reading are a compliment to traditional stats and the expertise of fantasy football players at all levels. Even the ESPN team supporting the insightful Matthew Berry is using IBM Watson to distill millions of articles into insights.

Here’s an in depth look at how we did it and how it works:

Artificial Intelligence Pipeline with IBM Watson

 The ESPN Fantasy Insights with Watson is powered by dozens of machine learning techniques.  First, Watson must be trained to understand the domain of fantasy football.  A combination of human annotators, data scientists, and developers took Watson through a series of supervised training based on years of previous fantasy football content including archives of the Internet.



An ontology mapped fantasy football unstructured data to 12 entity types and relationships.  A group of human annotators used Watson Knowledge Studio to markup and associate thousands of textual phrases to entities.  A statistical machine learning entity model was trained from the labeled data and published to Watson Discovery.  Now Watson was able to read and understand millions of news articles within the context of fantasy football.  Watson found the most relevant entities, keywords, and concepts of every article that was then sent to a pipeline for comprehension.

The machine-learning pipeline was developed to enable Watson to comprehend the unstructured text.  Over 90 gigs of raw text from historical fantasy football seasons were ingested into a document 2 vector model. A second more precise document 2 vector model was created from several thousand lines of definitions and football encyclopedias.  Both of the document 2 vector models were merged together so that the textual representation of hundreds of articles could be converted in numerical vector representations of length 195.  The performance of the document 2 vector ensemble model was outstanding.  The model was able to infer the correct answers in 97.96% of analogy tests.  A second and perhaps more difficult test was the keyword test with 76.01% correct performance.  Now that the raw text has been converted into numerical representation, the pipeline calls a deep learning layer of models.

Several deep neural networks with over 90 layers were trained to determine if a player was going to boom, bust, or play with an injury.  The activation functions of the neural networks were a mixture of tanh and rectified linear units (ReLU).  A technique such as batch normalization was used to increase the speed of the training while dropout nodes helped to prevent overfit.  The final layer of all the neural networks was a sigmoid since we converted the problem to a binomial classifier.  Stochastic gradient decent was the optimizer over the binary cross entropy loss function.  On this extremely challenging problem, our play with injury model and play meaningful minutes model garnered 66% and 82% accuracy measures on held-out evaluation sets, respectively.  The boom and bust model selection was difficult due to the imbalanced classes, and was a tradeoff between accuracy and distribution.  To avoid over predicting the number of boom players week over week, a lower boom model accuracy was selected at 67% accuracy.  The corresponding bust model produced a 57% accuracy measure.  All four of the models selected a reasonable number of players that were consistent with historical performance while maintaining high enough accuracy for meaningful insights.

Each of the deep player classifiers were normalized so that they could be compared.  Though the models used different acceptance thresholds, the end results were scaled so that the models appeared to all use a 50% threshold.  Finally, a random forest model was trained to map deep classifier confidence to player percentage.  The player percentage provided a number that was related to the expected percentage of players that would be a boom or bust. The boom and bust percentages are displayed on the ESPN Fantasy Football television and through a real time Watson Analytics dashboard.

*Courtesy ESPN: The image is from Matthew Berry’s ESPN Fantasy Football show.

The final portion of the machine learning pipeline included actual point projections for each player.  A support vector machine found the correct multiple regression function based on player position.  The predictors of the model included the output of the deep learning layers as well as structured statistics such as player age, height, percentage owned, and etc.  A projected point total for the week was determined for each player and run through a simulation to produce a point spread.  Historical point actuals for a player over the past number of weeks were combined with actual scores from similar players with the same position and projections.  The projected point total of the player was added to the historical data to bring the mean closer to the systems prediction.  The best fit between 25 different probability distribution functions (PDF) was chosen to represent the player’s score spread potential.  A random draw of 1,000 elements from the best distribution was used as the official score spread for the player.

Artificial Intelligence Visualization with IBM Watson

The predictions of each player from the machine learning pipeline is visually displayed on a player card.  The player card shows the simulation score spread in a plot of points versus probability.  The boom and bust flags depict the results from the deep learning models boom and bust percentage that is best used when comparing players.  The high and low scores are likelihood bounds of player’s fantasy football score.  The likelihood to play is an estimate that the player will be included in the lineup and get meaningful touches.


Each player can also be compared to another player.  In this mode, player distributions and relative boom and busts are easily compared.  For example, Player 1 and Player 2 can be visually compared.  The graphs indicate that Player 2 will generally score less than Player 1.  However, Player 1’s spread is more open with increased variability.


A second tab called “News Trend” depicts the textual evidence that Watson found from unstructured text.  The graphs show the recent sentiment trend of hundreds of news and social content.  An emerging application feature, News Trend, will show trending topics with evidence that can be explored by fantasy football managers.


A second real time dashboard is available for the ESPN editorial staff.  If any player’s projection is updated, an IBM Cloud service called Data Connect pushes data from DB2 on cloud to Watson Analytics for analysis.  Players are included within top high projections, solid performers, upside gain, downside loss, risky moves, and star finder.  Histograms that can be filtered by position, percent ownership, and projections allow a quick comparison of players.

IBM Cloud

Millions of fantasy sports owners and managers have hundreds of critical questions to answer before selecting their team. Who will score the most points this week? What player will be a bust or breakout? Will any players be a sleeper? Do any players have injuries that are going to impact their play? When should a player start to counter my opponent’s team?

The scale of users, data, and inquiry must run on distributed, redundant, and scalable hybrid cloud architecture.  Any disruption in service or big data problem would produce inaccurate predictions for fantasy owners.


The Fantasy Insights with Watson application is built from 4 Cloud Foundry applications running in 3 production IBM Cloud regions and 1 development IBM Cloud region.  The python application is the machine learning engine that pulls dozens of models from Object Storage. The python application runs in Dallas, Germany, and London to sustain continuous availability.  The application reads and writes to a high availability DB2 on the cloud.  Within the python engine, the application is multithreaded with 20 agents to pull data from Watson Discovery and Twitter with an additional 20 agents that run through the data machine learning pipeline.  The parallel actions allow the application to maintain near real time performance for player predictions.  An Abstract Programming Interface (API) written with the Swagger spec, is exposed for service consumption.  Individual players can be updated through the API as well as viewing job queue status.

In addition to the python app, a perl crawler runs on an IBM private cloud to discover any players that need to be updated through the python machine learning pipeline.  When a player is found, a request from the perl crawler is sent to the python application for a job.  The identified player is added to a work queue where worker threads pull for processing.

Before the python application runs a nightly refresh of all players, an R and SPSS Modeler job is run to update the python stats about players.  League player own percentage, injury status, and trades all impact the projections.  When the SPSS Modeler job is complete, the python application runs an entire refresh of the top 400 ESPN Fantasy Football players.  At the conclusion of a python application nightly run, a player projection report is sent to ESPN and IBM.

A node.js application is a façade for business rule integration and the construction of a widget.  The node.js application provides a Swagger API that is called by the python application to update all projection and unstructured data on Akamai’s Netstorage.  The node.js retrieves data from DB2 on Cloud and constructs JSON data.  The JSON data is uploaded to Netstorage for immediate publishing through the Akamai Content Delivery Network (CDN).  The node.js node then calls a second node.js application called the integration application.

The integration application’s sole responsibility is to call the IBM Cloud Service called Data Connector.  A job is scheduled on the Data Connector to update the Watson Analytics dashboard in real time.

An IBM premiere partner, Akamai, provides the network of 1,000’s of edge servers.  The caching layer protects the cloud foundry applications that must sustain heavy Big Data load from millions of user hits.  The edge servers provide data from the origin called Netstorage.  Projection, sentiment, injury, player, and widget data is stored on the origin.  When a user loads the Fantasy Insights player card, the javascript libraries are pulled from Netstorage so the client browser maintains all of the work.

The availability of the entire solution is maintained by monitoring services on the IBM Cloud.  The New Relic service monitors every application through pings and request results through Swagger API’s.  Through webhooks, the New Relic service pushes any issues to the IBM Alerting service.  The IBM Alerting service sends text messages, voice calls, and Slack message so that IBM team members can rapidly respond.

My ESPN Fantasy Football Results

Throughout the ESPN Fantasy Football season, I used the Fantasy Insights with Watson to set my roster week to week.  Within a 14 team league with a Points Per Reception format, my final regular season record was a perfect 13-0. When I was deciding what players to add to my roster from the waiver wire, I compared player performance and predictions.


Through some close matchups and favorable chances, I was undefeated through the regular season.  The application added empirical rigor to my decisions. Stay tuned for my team’s playoff performance update!

And a special thank you to Gray Cannon, Micah Forster, Jeffrey Gottwald, Stephen Hammer, John Kent, Coleman Newell, Elizabeth O’Brien, Jeff Powell, and Patrick Veedock for reviewing and editing this article.

Learn more about IBM and our Technologies


6 comments on"Watson’s artificial intelligence helps millions of ESPN fantasy football owners make better decisions"

  1. Palani Sakthi December 14, 2017

    Excellent article and work.

    • Aaron Baughman December 21, 2017

      Thanks Sakthi! Perhaps you will join our league next year.

      • Christopher Hardy December 30, 2017

        Next Year let me know about next year joining your league … I would love to connect internally and see how you developed this!!! I have worked for IBM Watson Core and am currently working with Watson Health Imaging.

  2. Watson Sucks!

    ~ HAL 9000

  3. Hi Aaron,

    Fantastic article. The question everybody must be making is, are you using this technology to get richer…and does it work? 😉

    Now talking seriously. I’m trying to build a similar interface for a company but its oriented to ticket solving. I’ve used the Knowledge Studio to create entities/dictionaries/relationships, and then exported the model to NLUin order to get customized entities and later build an apropiate ML. When i’m building the ML, it asks me for the proper columns and type values they should treat. These columns positions should be static, but NLU output gives me dynamic outputs depending on the content submitted. When changing annotations in KNowledge Studio, i get different qty of outputs in NLU. The problem is that i really don’t know how to deal with dynamic output entities/texts and text values when building a CSV file as input for ML. How do you deal treat those entites(/texts when building a CSV file to build your Statistical ML? WHat process did you use to achieve that step?

Join The Discussion

Your email address will not be published. Required fields are marked *