The IBM Watson Personality Insights service (formerly known as IBM Watson User modeling service in Beta) was released in February as a Generally Available (GA) service on IBM Bluemix. The Personality Insights service uses linguistic analytics to infer the personality traits (“Big Five”), intrinsic needs, and values of individuals from communications that the user makes available via mediums such as email, text messages, tweets, forum posts, and more. These three types of information correspond to the color-coded sections shown when visualizing the analysis that is performed by Personality Insights:
This blog post discusses the extensive research behind the development of the Personality Insights service. For complete information about the service and references to papers that have been published on that research (and on personality traits, intrinsic needs, and values in general), see the Personality Insights documentation.
We started our work on Personality Insight in 2012 as part of a research effort to understand social media behaviors such as responding to questions or propagating information (for example, retweet in Twitter). We explored the characteristics of an individual from social media to understand their impact on such behaviors, and found that people with specific personality characteristics responded and re-tweeted in higher numbers. For example, people who scored high on excitement-seeking are more likely to respond, those who scored high on cautiousness are less likely to respond, and that people who score high on modesty, openness, and friendliness are more likely to propagate information.
As part of our work, we also developed method to compute Big Five and other personality trails from textual information. We use a method that is inspired by existing research and leverages traditional approaches such as using a Linguistic Inquiry and Word Count (LIWC) psycholinguistic dictionary to find psychologically meaningful word categories, while improving upon them.
To infer personality characteristics from textual information, we tokenize the input text and match the tokens with the LIWC psycholinguistic dictionary to compute scores in each dictionary category. We build inferences by matching words from the text with words from the dictionaries. Such words are often self-reflective, such as words about work, family, friends, health, money, feelings, achievement, and positive and negative emotions. We then use a weighted combination approach to derive Big Five and facet scores from LIWC category scores. These weights are the coefficient between category scores and characteristics, using coefficients that were derived by comparing personality scores obtained from surveys with LIWC category scores from text for over 500 individuals.
To infer Values, we use the coefficients derived between Values and LIWC category scores. Because no prior work existed that reported such coefficients, we derived such coefficients by comparing Values scores that were obtained from surveys with LIWC category scores that were obtained from text written by more than 800 individuals.
To infer Needs, we use a statistical model based on ground-truth scores that were obtained through a Needs survey and are correlated against textual features that were derived from text written by more than 200 users. Such textual features are computed by using a custom dictionary derived from the text of users who expressed different needs.
To help verify the accuracy of our models, we compared scores derived by the models with survey-based scores for 250 Twitter users. We found that scores for characteristics inferred for all three models correlated significantly with survey-based scores for more than 80 percent of the Twitter users. Participants also used a five-point scale to rate how well each derived characteristic matched their perceptions of themselves, which showed that the inferred characteristics largely matched their self-perceptions. The following figure summarizes these ratings:
We can infer personality characteristics with reasonable accuracy, but the real question is how they impact real world behavior. Over the last few years, we have conducted a set of studies to identify the extent to which such personality characteristics can predict people’s behavior and preferences. We found that people with high openness and low emotional range (neuroticism) scores were likely to respond favorably to opportunities such as clicking on an advertisement or following an account. To demonstrate this, we found that targeting the top 10 percent of such users resulted in increases in click rate from 6.8 percent to 11.3 percent, and in follow rate from 4.7 percent to 8.8 percent.
Multiple recent studies show similar results for characteristics computed from social media data. One recent study with retail store data found that people with high orderliness, self-discipline, and cautiousness scores, but low immoderation scores, were 40 percent more likely to respond to coupons than the random population. A second study found that people with specific Values had specific reading interests. For example, people with a higher self-transcendence (motivation to help others) value demonstrated an interest in reading articles about the environment, and people with a higher self-enhancement (concerned with their own success) value showed an interest in reading articles about work. A third study of more than 600 Twitter users found that a person’s personality characteristics can predict his or her brand preference with 65 percent accuracy.
When inferring information from text, a key question is how much text is required in order to make reliable inferences about personality characteristics. We have done experiments to understand how word quantity affects our assesments, and have found that our service requires at least 3500 words written by an individual to produce a personality portrait with meaningful results. If you submit fewer than 2000 words, the service reports a warning but still processes the input. If you provide fewer than 100 words, the service reports an error and does not analyze the input text. In addition, the input text must contain at least 70 words that match words found in the standard Linguistic Inquiry and Word Count (LIWC) psycholinguistic dictionary. The requirement of a minimum of 70 matching words from this dictionary was verified through a series of experiments that were done with the corpus used by the service.
What’s next for the Personality Insights service? We are initially focused on expanding the applicability of the service to additional data sources because the current models for the service were trained on specific online media. Using other media sources can currently affect scoring by between two to sixteen percent. We are therefore working to build models from other online media sources in order to reduce such variations. We are also conducting more validation studies with larger number of users to assess the accuracy and impact of our models in the real world. We are also developing models for other spoken languages because our current models are all based on English language text.
We are excited to see that a large number of applications were built using our service during its beta release as the User Modeling service, and that new applications are being built using the generally available Personality Insights service. For detailed information about using the service or about the research that was discussed in this post, see the Personality Insights documentation. Please provide feedback on the service or ask questions in the Watson community.