Gain insight into ESPN fantasy football with Watson

With contributions by Stephen Hammer, Andy Wismar, Micah Forster, Jeff Powell, Sai Gudimetla, Gray Cannon, David Reedy, Miyuki Dalton, Eris Calhoun

When you play ESPN fantasy football, you can choose to either get better or get bitter. If you choose to get bitter, you might as well taste a lemon, throw a dart, and let chance guide your team management decisions. Alternatively, you can base your fantasy football team decisions on our world-class, enterprise-grade AI Watson Insights system. Watson™ will help your team boom by guiding you to avoid players that are likely to bust. Watson will read millions of articles, watch thousands of videos, and listen to hundreds of podcasts and distill it all into actionable evidence that can ultimately help you win. Fortunately, with our hybrid cloud architecture, you — along with millions of other users — can access our system from anywhere around the world at any time. Check out the video for an overview.

Here in Part 1 of 4, we will describe the system architecture, discuss the hybrid cloud approach, show our monitoring strategy and introduce the fair machine learning pipeline.

Overall system architecture

The architecture and techniques of this system are designed to deliver AI insights at scale with continuous availability for the fan. On the IBM Cloud, this is composed of eight apps running across three regions. In the Dallas region, all applications are deployed, monitored, and running. In London and Germany, the system is in a warm standby mode. The underlying databases, such as Db2®, are highly available with HADR spread across all of the sites. Sitting between our user experience and AI components is a web acceleration tier comprising two stacked content delivery networks (CDNs). If our active Dallas region has an issue or we need to perform maintenance, the consumer-facing experience is not impacted because our AI insights are still available across hundreds of edge servers, which provide a continuously available user experience with a highly available back end.

Figure 1

The applications that generate our AI insights include Cloud Foundry applications, as well as stateless computing functions. This system is written in Python and JavaScript. The Python applications handle the AI algorithms, including data cleansing, normalization, model training and test, fairness evaluation, and multimedia management. The JavaScript applications that are run through Node.js combine finished data artifacts to generate content in support of the user experience. Finally, the front-end React application is built on demand and released through the hybrid CDN (ESPN’s and IBM’s). If any of the applications experience slowness or a service outage, our continuous service monitoring will send out alerts via email, Slack, and mobile.

The Python application, Cloud Foundry Natural Language Container, encapsulates the majority of the machine learning. This app pulls news articles from Watson Discovery and enrolls the articles into a custom collection with our statistical entity detector. The engine pulls sources from IBM Cloud Object Storage, Twitter, ESPN, and RotoWire. Unstructured information is combined with traditional statistics so our system can reason about players. The job runs every day as a batch and periodically throughout the day as each player’s state changes. The incremental runs are driven by a Python app that detects if a player’s projections or actuals have changed, and if so, sends a post request to the Cloud Foundry Natural Language Container app that runs the player through the machine learning pipeline. In parallel, and on the hour, a player pre-processor updates player information, such as a trade, injury, suspension, and bye status. When data changes, a stateless function updates IBM Cognos® Dashboard Embedded. The dashboards are an important experience for ESPN analysts, who check the tool for trending topics and players.

At the same time, the Python apps process videos and podcasts. The Cloud Foundry Multi-Media Container processes video and podcasts. For videos, each MP4 discovered across our selected sources is pulled down and split into sound components, such as an MP3 file. The sound file is submitted to Watson Media Cloud for transcription. The podcast MP3 is processed in a similar series of steps in which a transcription file is obtained. Then each transcription file is ingested into a custom Watson Discovery for machine reading. All player entities are found and associated with the transcript file. From there, the system associates players to each video for evidence retrieval.

All of the converted text from the video and podcast transcripts, as well as from articles, are used as the basis of the machine learning pipeline. The textual data is turned into predictors to determine player boom, bust, likelihood to play with an injury, and likelihood to play meaningful minutes. The open source AI Fairness 360 library identifies and mitigates bias present within the output of the boom and bust models. The favorable label or team that groups players into unfair groups is dampened through slight modifications of boom and bust probabilities. The fair scores are then used to create projection distributions that are monitored for bias with IBM OpenScale™.

After the machine learning pipeline is finished, the Node.js Content Generation app merges the content into JSON files for posting to IBM Cloud Object Storage. The Cloud Object Storage content is populated through the IBM Content Delivery Network and the ESPN delivery network for double layers of data protection and availability. As data expires by passing the duration of the time-to-live limit, request traffic will eventually go to the origin or Cloud Object Storage for data to be cached back on the content delivery networks. As data is generated by the Node.js app, any existing data within Cloud Object Storage is replaced by the newer data and pushed through the delivery networks.

All of the apps are monitored with IBM continuous availability and alerting so that any failing service can be recovered. At the same time, LogDNA aggregates logging data from our apps so they can be easily managed, indexed, and searched. The logging service helps us identify race conditions or other errors within long-running jobs.

From the user experience perspective, the system builds a React application for deployment within a WebView or iFrame. The components and containers within React increase the team’s development and deployment speed. Within the experience, users can click on a social feature to share player cards. The Node.js social sharing generator app creates a sharable HTML template file. The template file follows Facebook and Twitter rules to reference images to be shared. The timestamp is placed within the HTML file to prevent the social media platforms from using a cached version of the media. The social sharing app then posts a request to the social image generator app to create snapshots of player cards. A custom player view is rendered into a headless Chrome instance, in which two images are produced. The images are stored on Cloud Object Storage and referenced by the HTML file that is shared across Facebook and Twitter.

An important piece of our architecture includes a Python crawler. Throughout the day, player statistics can be updated. As these updates occur, the crawler discovers the differences between the Watson Insights players and those from ESPN. The crawler posts a request to the Cloud Foundry natural language container to process the player through the entire machine learning pipeline. As a result, the Watson Insights system maintains the most current player states.

Hybrid cloud

Running our system distributed across different clouds that includes third-party clouds is critical to sustaining a large enterprise-scale AI computing system like Watson Insights. The majority of our services, apps, and databases reside within the IBM public cloud. The deployed compute systems are spread across regions like Dallas, London, and Germany. Different components, such as IBM Cloud Object Storage and Db2, are replicated across the U.S. region. Since our apps are Cloud Foundry apps and can support containerization technology, the loads have the agility to be moved anywhere. For the Watson Insights system, services within it are moved across virtualized and containerized resources. As resources are shifted, each can be vertically scaled by dynamically adding memory, compute, or both. To parallelize jobs, any number of instances can be created on demand.

Figure 2

The APIs integrated within Watson Insights are spread across different clouds. A dozen APIs are available through both ESPN’s public and private clouds. The Watson Insights service relies on the services to pull data that supports the user experience, as well as the machine learning. Other APIs integrated within the system include Twitter and RotoWire. Both data feeds provide additional predictors for the machine learning pipeline.

The web acceleration tier comprises four cloud layers. First, the origin for content is contained by IBM Cloud Object Storage. The service is distributed across the IBM U.S. public cloud. Next, the IBM content delivery network protects the origin from billions of hits every day. The configuration and monitoring features are accessible on the IBM public cloud while the midgress and edge servers are provided by the Akamai cloud. Over a week, the system maintained 3.65 billion hits and served 214.4 TB of data with a hit ratio of 82.82 percent. During that time period, the majority of traffic originated from North America with a small number of users coming from Europe, Australia, Asia, India, South America, and Japan.

To further insulate the Watson Insights system, the ESPN content delivery network fronts the IBM delivery network. The ESPN content delivery network is supported by Akamai and spread worldwide. The architecture is similar to the IBM content delivery network. Fan-facing apps interface with the ESPN content delivery network edge servers. The general traffic flow is as follows:

  1. Client host renders the React application with
  2. Traffic is forwarded to the ESPN content delivery network with
  3. Traffic is forwarded to the IBM content delivery network with
  4. Traffic is forwarded to IBM Cloud Object Storage.

The data TTL is set according to data type and put into a header by the content generation Node.js app. The DNS entries are CNAME’d for logical and cookie inheritance properties.

Monitoring and alerting

All of the applications within Watson Insights are monitored for outages and stability. With dozens of services and dependencies, instrumenting the application to find vulnerability points and generating a first response under a service failure is critical to sustaining a continuously available service to the end user.

Figure 3

Each of the Node.js and Python apps are monitored by the IBM availability service. Any number of tests can be set up and monitored for response patterns. The tests can originate anywhere in the world and run during different sampling rates. Over time, the monitoring service aggregates the results to provide service availability.

Figure 4

If a service receives an error that is defined by alerting rules, such as HTTP returns that are not 200 codes or response-time lags, alerts are sent to IBM alerting. Policies are created to route messages to human operators that can take action to work on the Watson Insights system alerts. The alerts can be sent through web hooks to tools like Slack, SMS, phone calls, and email. If operators do not acknowledge a problem, secondary escalation messages can be configured and sent to second-line support staff.

Figure 5

Machine learning pipeline flow

The machine learning pipeline was designed and built over several phases. First, Watson is trained to read textual data from the perspective of fantasy football. Next, the system begins to comprehend the data based on semantic relationships among keywords, concepts, and entities. Finally, Watson understands the patterns through millions of words combined with traditional fantasy football stats for interpretable insights.

Figure 6

Machine reading

Watson is trained by several human annotators that associate text within 1,200 articles to 13 entity types. The 1,200 articles are a sampling of articles from 5,568,714 documents about specific players over a previous fantasy football year. Ten football dictionaries were used to pre-annotate the articles. The approach accelerated the annotation process by suggesting textual annotators for a human to review and correct. On a daily basis, the group of human annotators met to discuss their relative understanding of each entity. The kappa statistic was used to measure the group’s disagreement over each entity. Over time, each individual began agreeing upon a common definition of entities. Entities included player, team, contract, injury, performance, gear, etc.

After the documents were annotated, a statistical entity recognizer was trained. During training, the results of cross-fold validation indicated that the entity model was ready for deployment. The model was deployed to Watson Discovery for custom machine reading.

Machine comprehension

Each of the documents found by querying Watson Discovery news or transcripts from podcasts and videos was enrolled into a Watson Discovery custom collection. A second query was issued against the custom collection to extract fantasy football-related entities. After machine reading, each document had results containing a list of keywords, concepts, and entities. To semantically understand each word, two Word2Vec models projected the letters into a high dimensional space for spatial representation.

The broad Word2Vec model was trained on 94 GB of text that represented slang and general use of fantasy football terms. The precise Word2Vec model was focused around football dictionaries. Each word from each of the list of keywords, concepts, and entities was input into both models. The results created a large floating-point feature representation of the words. The vectors were averaged together to provide Watson with content comprehension. Now Watson has an understanding about the word meanings. The three averaged vectors were then combined in preparation for machine understanding.

Before the next phase, we had to evaluate and ensure that Watson was comprehending the words. Two tests were applied to the Word2Vec models. The first was a word keyword lookup test. For example, if we said player 1’s name, the result should be the player’s team name. When given the keyword test between players and teams, 80 percent of the questions were correct if the answer was in the top 1 percent of ranked answers. When given the keyword test between team and location, 75 percent were correct when the correct answer was in the top 1 percent of the results. Next, Watson was given two analogy tests between players and teams, as well as between teams and locations. We found that if the data was in the top 500 results, we are in the top 1 percent of the data. With players to team testing, 100 percent of the correct analogies were found. For team with location, 93.48 percent were correct. The results of machine comprehension were outstanding.

Machine understanding

After the words are converted into semantic vectors, four deep learning models were applied. Watson estimates the likelihood of a player booming, busting, playing with a hidden injury or playing meaningful minutes. Four 96-layer deep neural networks were trained on the previous year’s historical data. As a result, the large feature vectors from the output of the Doc2Vec process were inputs for the models. A mixture of operations such as ReLU, Sigmoid, and Softmax, were combined in different neural network topologies to find the best model performance. If the output of the model was a boom or bust, we used the AI Fairness 360 library to mitigate any bias associated with a specific team since some teams are more popular than others. This would cause the output of the neural networks to be slightly altered.

The output of the four neural networks along with statistical data was combined and input into an ensemble of regression functions. The player position determines which regression model should be applied to a player’s feature vector. The result of the regression provides a point projection that can be used in a simulation to generate scoring spreads. In addition, Watson uses IBM OpenScale to identify and mitigate bias from scoring projections given player biographics.

Each value computed by the deep learning algorithms go through post-processing algorithms. First, fair results for boom and bust are saved into the database. Next, the boom and bust features need to be normalized to be related to the score distribution’s shape before the 15th percentile and after the 85th percentile. Finally, all of the machine learning results are saved and ready to be processed by the Node.js content generator for final end-user delivery.

Let’s play

If you have not already done so, try out Watson Insights through the ESPN fantasy football app. Leverage the power of AI to increase your knowledge about players before starting them. Watson Insights is here to help you win (#WinWithWatson).

In the next article in this four-part series, we discuss the social sharing of your player cards. Get ready to send your friends some Watson smack talk!