Contributions by Micah Forster, Jeff Powell, Sai Gudimetla, Mohit Trivedi, Gray Cannon, and Stephen Hammer.
Picking players to start in fantasy football can be a daunting and challenging experience. Now more than ever, the field of play is constantly changing with players and entire games being affected by COVID-19 and an unprecedented number of injuries. Confounding team manager decisions are the emergence of lesser-known players from team depth charts and numerous schedule changes. Team managers must quickly assess their players’ potential within a changing environment. Prior beliefs and traditional player valuations quickly become irrelevant.
To help avoid game time decision paralysis with uninformed decisions, Player Insights with Watson summarizes player capabilities in real time. Managers have the opportunity to quickly compare and analyze players to make decisions about starting or sitting a player with confidence. Adding players from the waiver wire and free agent pool becomes analytic and understandable. The fluid movement of players now becomes an advantage with Player Insights with Watson.
The technology behind Player Insights with Watson uses a combination of AI, natural language processing, and a hybrid cloud to deliver meaningful insights as part of an engaging experience. In part 2 of this three-part series, you’ll find out how.
Figure 1 shows the architecture of the AI system that runs on a hybrid cloud and is powered by Red Hat® OpenShift®. The architecture and techniques of the system are a first of a kind. There are two OpenShift clusters provisioned on IBM Cloud® that run the applications. One is in Dallas and another is in Washington, DC. Both of these clusters are active, and we can direct traffic to one or both. This supports our continuously available architecture. We also have alerts ready for OpenShift services to notify us if a service goes down or becomes unavailable.
Figure 1. Player Insights with Watson system architecture
Within our primary OpenShift cluster, we deploy Docker containers that run over Kubernetes. Our developers can write code and push to a code repository where a runner builds Docker containers and pushes to OpenShift. OpenShift then takes the apps and distributes them across pods over eight workers. In this example, we have five Python and three Node.js apps. The Natural Language Container is the main AI pipeline that runs the machine learning models distributed across our other apps. We use IBM Watson Studio and its AutoAI feature to train our models and deploy a few of them to Watson Machine Learning. The Natural Language Container app connects to Watson Discovery and queries data from the internet. The Multimedia container connects to the ESPN video and podcast holdings. From there, we use the Watson Speech to Text service and the media cloud to translate from sound to text. After normalizing all of our data to text, we measure the player buzz with a custom machine learning model. Next, we push all of the text through our AI pipeline so that Watson can read, understand, and interpret the information. Throughout the day, we automatically determine whether any player states have changed. If so, we request the player data to be run through the AI Pipeline.
The output of the AI player insights is then passed into a debiasing service that we built on top of OpenShift as part of the IBM Cloud Pak® for Data. We are able to deploy Watson Anywhere such as the Watson OpenScale service. This gives us flexibility to select any Watson services that we want to use as well as the provision location. Our AI models and data reside within a high availablity Db2 on Cloud and IBM Cloud Object Storage. In fact, for our social share feature, we have two apps that create content for social sharing on Twitter and Facebook. The pieces of data are linked and uploaded onto IBM Cloud Object Storage. When a user clicks the user experience share button, the traffic flows to our content delivery networks (CDN). All consumer traffic goes through our CDNs and never directly to any of our services. The user experience is also served through the CDN. Our users can view our AI insights on their mobile device of choice or a desktop. ESPN analysts can view insights to help them determine what to talk about on air or even what topics to report on.
General player insights algorithm
The Player Insights with Watson feature follows nine steps, shown in Figure 2, to produce comparable player statistics that balance understandability and complexity.
Figure 2. The nine steps that generate player insights
First, we taught Watson to understand the domain of fantasy football using Watson Knowledge Studio to generate a custom model for statistical entity detection. Exactly 13 different entity types focused on fantasy football are used for identifying football terms.
After teaching Watson how to read and understand key football terminology from natural language, videos, and podcasts from over 100,000 sources, we wanted to be able to take this data and derive actionable insights from the AI technologies. We trained a deep learning pipeline to interpret the information.
High volume data is retrieved within the form of text, podcasts, and videos. The podcasts are transcribed to text with the Watson Speech to Text service. The videos are split into visual and sound components where the sound is normalized into text similar to the podcasts.
Next, we used our Document2Vector model to vectorize the enrichments returned from Watson Discovery to provide semantic relationships between words. This method is a word encoding approach.
These feature vectors are used in our neural models to help classify a player’s likelihood of boom, bust, playing meaningful minutes, and playing with a hidden injury.
The classifier results along with traditional ESPN stats are used to predict a player’s projection. The projection along with historical information is fit to over 24 probability distribution functions (PDF).
Each of the PDF functions is used to run a player-based simulation. A sampling draw from the simulation forms the player performance shapes that are used to contrast players.
Now, with the help of AutoAI and OpenScale we are able to streamline our process of constructing and training our models while providing in-depth analysis into why certain decisions were made by our AI. We identify and mitigate bias or unfair predictions along selected attributes such as a player’s team.
When our insights have been processed through our pipeline, they are consumed by the ESPN mobile and desktop apps through our web accelerator tiers. Content delivery networks distribute the content around the world in Edge servers for at-scale serving.
Let’s go into more detail on the AI Pipeline.
The machine learning pipeline is composed of natural language understanding of media sources, deep learning networks, debias algorithms, and player performance spreads. The deep learning models produce player states such as performance boom, bust, play with a hidden injury, or play meaningful touches. The debiasing algorithms include a fairness post processor to account for bias in the media coverage surrounding a player or team. We then produce score spreads for a player projection by finding the best fit probability density function. Through sampling over the PDF, we approximate a mean player performance. The implementation of the machine learning pipeline is supported by five applications, dozens of models, several data sources, and many data science environments.
Natural language understanding of sports data
First, the system had to be taught to read fantasy football content. A novel language model was designed with custom entities and relationships to fit the unique language people use to describe players and teams in the fantasy football domain.
Next, an annotation tool called Watson Knowledge Studio was used by three human annotators to label text within articles as any combination of 13 entity types such as player, team, and performance. With this data, a statistical entity detector was trained and deployed to our system called Watson Discovery that continually ingests content from over 100,000 sources. Podcasts and videos are transcribed and ingested into the Watson Discovery system. The system is able to discover fantasy football entities, keywords, and concepts from the continually updating corpora based on our trained statistical entity model.
Then, the system used a Document2Vector model to understand the natural text from a query. A very specific query was initially issued to Watson Discovery such as “Tom Brady and Patriots and NFL and Football.” If a query did not return at least an experimentally determined 50 documents, the query was broadened until it only had “Tom Brady and NFL.” From the query result, a list of entities, keywords, and concepts for each document was converted to numerical feature vectors. Each of the feature vector groups was averaged together to represent a semantic summarization. All of the feature vector groups from each document were averaged across all documents. The three keyword-, concept-, and entity-averaged feature vectors, along with player biographic data were input into the deep learning portion of the pipeline.
The Document2Vector model was tested with two different types of semantic meaning evaluations. First, an analogy test was provided to the model. If the relation Player1 is to Team1 as Player2 is to the X is presented to the model, the correct answer for X should be the Team2. In the player to team analogy testing, the correct answer was in the top 1% of the data 100% of the time. The team to location analogy was slightly lower, with a 93.48% accuracy because the natural queries were not focused around teams. The second test provided a set of keywords to the model and expected a related word. For example, if Player1 was input into the model, we would expect to see the Team1 as output.
The deep learning pipeline phase had four models that were over 98 layers deep. The models were classifiers for each player to determine the probability of a boom, bust, play with a hidden injury, or play meaningful minutes. The probability scores provide a confidence level of player states so that team owners can decide their own risk tolerance.
More specifically, the bust game classifier had an accuracy of 55% with a modest class separation, while the boom classifier had an accuracy of 67%. The bust classifier was optimized on real-world player bust distribution and accuracy because players with high bust probabilities significantly overscored their projections on average. The bust players that were missed and marked incorrect were very close to the binary threshold of 0.5. Further, the negative predictive value of the bust model is 85.5% accurate and it produces a real-world percentage of bust players at 12%. As a tradeoff, the over predicting of busts would be worse than a high accuracy. The accuracy number is not as meaningful an evaluation metric as the negative predictive value and percentage of players predicted to be a bust.
The play with injury classifier had an accuracy of 77% with a positive predictive value of 68.1%. The positive predictive value is very important for this classifier so that we know if a player is going to play with a hidden injury. The play meaningful minutes model produced an accuracy of 91.4%. The output of the class and probability provides valuable predictors for the score projections as well as insights about each football player. From a real-world distribution of players that boom or bust, we were close to our objectives. Between 12 – 16% of players generally boom while 30% can bust week over week. This was important so that consumers do not lose confidence in our system.
At the end of the deep learning phase, a fairness post processor ensured equal equity across players on different teams. For example, players on popular teams such the Rams would unfairly have more players predicted to boom and less to bust based on the conversation of the crowd. As a result, each of the players were split into privileged and unprivileged groups. The output of boom and bust probabilities was slightly changed based on a player’s team.
Our system used a transformer model with a linear classification layer to classify the article sentiment. The sentiment model was trained using a pretrained transformer model based on OpenAI’s Generative Pretrained Transformer Model-2 architecture (GPTM), which was trained on Wikitext-103 data. We fine-tuned the model with our own labeled data by training the linear classifier within the GPTM model and altering hyperparameters.
The sentiment classifier had an accuracy of 81%. The classifier did well on labeling positive and negative articles with accuracies of 87% and 83%. The neutral sentiment had an accuracy of 61%, which was harder to predict because of inter-annotator disagreement. The sentiment labels next to each piece of evidence provided users with overall estimation of the crowd’s opinion of each player.
Finally, the outputs of the deep learning layers along with structured ESPN data were input into an ensemble of multiple regression models. This merging of natural language evidence with traditional statistics produced a score projection for every player. On average, the combination of structured and unstructured data produced a better RMSE than each independently. Finally, 24 PDFs were fit to the score projection and historical score trends to produce a player score distribution. While defining and refining these techniques, our team conducted data exploration in Jupyter Notebooks and IBM SPSS. Through experimentation, we selected model hyperparameters and algorithms. When we combined our models with ESPN, we were better than our parts.
Player Insights with Watson experience
Pick your best players to start or find complementary players to add to your roster with our AI. Armed with Player Insights with Watson, you will be a confident and informed team manager. Share your best player cards on social media. We will see you in the championship. #winwithwatson
For additional details, please see https://sentic.net/wisdom2020baughman.pdf.