Behind the code: Fused Fantasy Football insights

ESPN and IBM have teamed up to bring a new level of insight to fantasy football team owners that correlates millions of news articles with traditional football statistics. Watson is built on an enterprise grade machine learning pipeline to read, understand, and comprehend millions of documents and multimedia sources about fantasy football. The ESPN Fantasy Football with Watson system has been a significant undertaking with many components.

This article is the sixth in an eight-part series that takes you behind each component to show you how we used Watson to build a fair world-class AI solution.

AI Fantasy Football evidential fusion

Watson watches videos, listens to podcasts, and reads millions of news articles to generate assessments about your fantasy football players. You can let Watson discover the correlations of unstructured multi-modal data with traditional statistical information. Instead of reading, watching, and drawing links to statistics, fantasy football managers can use the fused AI insights for precise, oriented player predictions and score projections. Watson places statistical football data within the context of the late-breaking news content so that future data trends can inform roster decisions. You can create a starting line-up based on historical football statistics that has been influenced by recent news. #WinWithWatson

The statistical, biographical, sentiment, and unstructured data is merged at two points within the system. First, evidence from Watson Discovery, ESPN, and Watson Media is aggregated for each player. Derived predictors such as overall document sentiment is generated for input into each of the trained deep player classifiers. In addition, features about concepts, keywords, and entities are separated or stratified so that sections of the overall neural network will receive the appropriate input. The player classifiers are neural networks that determine whether a player is going to boom, bust, play with a hidden injury, or play meaningful touches. Next, the output of the neural networks, biographic data, and sentiment data is vectored to a football position-based multiple regression function for input. Forty-six variables are fused together to calculate the value of a score prediction. The score prediction is used within a simulation to create a probability density function to describe the probability that a player will have a score over a range of values. A fair post processor is applied to the output of the classifiers to ensure that the models are fair and not favorable toward any group of teams. Finally, the classifiers are normalized so that each of the neural network outputs is comparable.

evidence fusion overview

In more detail, the Python code that invokes the deep learning fusion merges together several evidence sources within a neural network. The object MergedPlayerClassifier encapsulates the deep learning merging along with the activation function to use along each dimension of evidence. The instance variables describe how the object should be instantiated and the hyperparameters of the deep learning topology. The document-2-vector output from the machine learning pipeline inputs concepts, keywords, and entities into the deep learning network. Additional sources such as sentiment, biographical information, and player fantasy football stats are used together.

class MergedPlayerClassifier(object):
    The class represents a deep learning model

    TANH = "tanh"
    RELU = "relu"

    def __init__(self, use_object_storage,activation_function,model_threshold,model_type,model_file,weight_file,unique_labels,total_dimensionality,total_input_nodes,object_storage_container,sentiment_content_class_name,sentiment_content_size,bio_content_size,player_stat_size,epochs=100,batch_size=20,train_file=None,test_file=None,**kwargs):

        self._activation_function = activation_function
        self._model_threshold = model_threshold
        self._model_type = model_type
        self._use_object_storage = use_object_storage
        self._sentiment_content_class_name = sentiment_content_class_name
        self._sentiment_content_size = sentiment_content_size
        self._bio_content_size = bio_content_size
        self._player_stat_size = player_stat_size

The sentiment feature vector is of length 28. The attributes include sentiment average and count number for each of the 13 entities we trained Watson to learn through Watson Knowledge Studio. The average document sentiment and entity sentiment were included within the feature vector because they provided predictive power.

sentiment_contents = self._discovery_store.get_machine_learning_results(playerid,event_name,event_year,model_name,self._model_type,self._sentiment_content_class_name,data_timestamp=data_timestamp)
social_content_vector = sentiment_content.generateVector()

Each of the player’s biographic information were retrieved and included into a player content feature vector. The biographic information included five attributes such as age, years of experience, position, height, and weight. Through data exploration, these attributes were the most informational and contributed to boom, bust, player with hidden injury, and play meaningful touches player classification.

player_contents = self._discovery_store.get_player_features(playerid, event_name, event_year, 'Player')
bio_vector = player_contents.generate_feature_vector()

For each player that enters the machine learning pipeline, we retrieve the player statistics. The statistics include fantasy football data such as the percentage of leagues that own the player, the week, if a player is likely to play, and if a player is injured. The get_deep_learning_stats retrieves the current week and one previous week. As a result, the feature vector contains rates of changes between the previous week and current week. For example, the difference of the player owned percentage for the current and previous week is divided by the duration of time.

player_stats = self._discovery_store.get_deep_learning_stats("PlayerStats",event_name,event_year,playerid)
player_stats_obj = PlayerStatistics(player_stats)
player_stats_vector = player_stats_obj.compute_feature_vector(self._player_stat_size)

Each of the feature vectors were input into a prediction method. The total dimensionality of the input was 237. The feature_vector encoded the average numerical representation of all keywords, concepts, and keywords within news articles that were about a player. The topology of the neural network was nested within an inner create_model method called create_topology. The activation functions were selected based on empirical experimentation.

A set of parallel paths were created that represented individual neural networks for biographical data, social content, player stats, and unstructured information comprehension. Within the Keras API, the parallel paths are sequential objects. Different types of layers such dropout, dense, and batch normalization are linked together within a feed forward graph. The output of each neural network is input into a merge node that has an additional 25 layers. Through back-propagation with stochastic gradient decent and a loss function of binary cross entropy, the feature vectors were fused together to find additional predictors throughout the neural network. The output of the deep neural network was squashed with a sigmoid activation function to give us a value between 0% and 100%.

prediction = self.predict(social_content_vector,feature_vector,bio_vector_np,player_stats_vector)

def create_topology():
            """The method defines and creates the topology of the neural network..

                a compiled model
    if MergedPlayerClassifier.RELU == self._activation_function:
    elif MergedPlayerClassifier.TANH ==    self._activation_function:

model_bio_content = Sequential()      model_bio_content.add(Dropout(0.2,input_shape=(self._bio_content_size,)))

model_social_content = Sequential()            model_social_content.add(Dropout(0.2,input_shape=(self._sentiment_content_size,)))

step_size = int(self._total_dimensionality / self._total_input_nodes)
model_entities = Sequential()            model_entities.add(Dropout(0.2,input_shape=(step_size,)))

model_keywords = Sequential()

model_concepts = Sequential()

model_player_stats = Sequential()

model_total = Sequential()
model_total.add(Merge([model_bio_content,model_social_content,model_entities,model_keywords,model_concepts,model_player_stats],mode = 'concat'))            model_total.add(Dense(80*2,use_bias=True,activation=activation))


Next, a multiple regression function merges an additional 46 predictors. Multiple regression functions are trained for wide receivers, quarterbacks, running backs, tight ends, kickers, and defenses. The player biographic data is used as a pivot point to select the model to apply to a player’s feature vector. A custom probability spread algorithm is invoked to determine the best probability distribution function out of 25 possible selections.

def predict(self,model_name,feature_vector):
            accuracy = self._model_group[model_name].predict(feature_vector)
            return accuracy
        except Exception as e:
            logger.error("Problem running a prediction with "+model_name+" "+str(e))
            raise e

spread_tuple = self._simulation.calculate_point_and_probability_spread(prediction,outside_projection=outside_projection)

With some of the details within this blog, Watson is able to provide AI insights from a diversity of data sources. The player predictions and score distributions are highly precise so that you can select your best roster week over week. #WinWithWatson

Check back next time as I discuss fantasy football deep insight visualizations. To find out more, follow Aaron Baughman on Twitter: @BaughmanAaron.

The ESPN Fantasy Football logo is a trademark of ESPN, Inc. Used with permission of ESPN, Inc.