The IBM Watson Personality Insights service (formerly known as IBM Watson User modeling service in Beta) was released in February as a Generally Available (GA) service on IBM Bluemix. The Personality Insights service uses linguistic analytics to infer the personality traits (“Big Five”), intrinsic needs, and values of individuals from communications that the user makes available via mediums such as email, text messages, tweets, forum posts, and more. These three types of information correspond to the color-coded sections shown when visualizing the analysis that is performed by Personality Insights:


This blog post discusses the extensive research behind the development of the Personality Insights service. For complete information about the service and references to papers that have been published on that research (and on personality traits, intrinsic needs, and values in general), see the Personality Insights documentation.

We started our work on Personality Insight in 2012 as part of a research effort to understand social media behaviors such as responding to questions or propagating information (for example, retweet in Twitter). We explored the characteristics of an individual from social media to understand their impact on such behaviors, and found that people with specific personality characteristics responded and re-tweeted in higher numbers. For example, people who scored high on excitement-seeking are more likely to respond, those who scored high on cautiousness are less likely to respond, and that people who score high on modesty, openness, and friendliness are more likely to propagate information.

As part of our work, we also developed method to compute Big Five and other personality trails from textual information. We use a method that is inspired by existing research and leverages traditional approaches such as using a Linguistic Inquiry and Word Count (LIWC) psycholinguistic dictionary to find psychologically meaningful word categories, while improving upon them.

To infer personality characteristics from textual information, we tokenize the input text and match the tokens with the LIWC psycholinguistic dictionary to compute scores in each dictionary category. We build inferences by matching words from the text with words from the dictionaries. Such words are often self-reflective, such as words about work, family, friends, health, money, feelings, achievement, and positive and negative emotions. We then use a weighted combination approach to derive Big Five and facet scores from LIWC category scores. These weights are the coefficient between category scores and characteristics, using coefficients that were derived by comparing personality scores obtained from surveys with LIWC category scores from text for over 500 individuals.

To infer Values, we use the coefficients derived between Values and LIWC category scores. Because no prior work existed that reported such coefficients, we derived such coefficients by comparing Values scores that were obtained from surveys with LIWC category scores that were obtained from text written by more than 800 individuals.

To infer Needs, we use a statistical model based on ground-truth scores that were obtained through a Needs survey and are correlated against textual features that were derived from text written by more than 200 users. Such textual features are computed by using a custom dictionary derived from the text of users who expressed different needs.

To help verify the accuracy of our models, we compared scores derived by the models with survey-based scores for 250 Twitter users. We found that scores for characteristics inferred for all three models correlated significantly with survey-based scores for more than 80 percent of the Twitter users. Participants also used a five-point scale to rate how well each derived characteristic matched their perceptions of themselves, which showed that the inferred characteristics largely matched their self-perceptions. The following figure summarizes these ratings:


We can infer personality characteristics with reasonable accuracy, but the real question is how they impact real world behavior. Over the last few years, we have conducted a set of studies to identify the extent to which such personality characteristics can predict people’s behavior and preferences. We found that people with high openness and low emotional range (neuroticism) scores were likely to respond favorably to opportunities such as clicking on an advertisement or following an account. To demonstrate this, we found that targeting the top 10 percent of such users resulted in increases in click rate from 6.8 percent to 11.3 percent, and in follow rate from 4.7 percent to 8.8 percent.

Multiple recent studies show similar results for characteristics computed from social media data. One recent study with retail store data found that people with high orderliness, self-discipline, and cautiousness scores, but low immoderation scores, were 40 percent more likely to respond to coupons than the random population. A second study found that people with specific Values had  specific reading interests. For example, people with a higher self-transcendence (motivation to help others) value demonstrated an interest in reading articles about the environment, and people with a higher self-enhancement (concerned with their  own success) value showed an interest in reading articles about work. A third study of more than 600 Twitter users found that a person’s personality characteristics can predict his or her brand preference with 65 percent accuracy.   

When inferring information from text, a key question is how much text is required in order to make reliable inferences about personality characteristics. We have done experiments to understand how word quantity affects our assesments, and have found that our service requires at least 3500 words written by an individual to produce a personality portrait with meaningful results. If you submit fewer than 2000 words, the service reports a warning but still processes the input.  If you provide fewer than 100 words, the service reports an error and does not analyze the input text. In addition, the input text must contain at least 70 words that match words found in the standard Linguistic Inquiry and Word Count (LIWC) psycholinguistic dictionary. The requirement of a minimum of 70 matching words from this dictionary was verified through a series of experiments that were done with the corpus used by the service.

What’s next for the Personality Insights service? We are initially focused on expanding the applicability of the service to additional data sources because the current models for the service were trained on specific online media. Using other media sources can currently affect scoring by between two to sixteen percent. We are therefore working to build models from other online media sources in order to reduce such variations. We are also conducting more validation studies with larger number of users to assess the accuracy and impact of our models in the real world. We are also developing models for other spoken languages because our current models are all based on English language text.

We are excited to see that a large number of applications were built using our service during its beta release as the User Modeling service, and that new applications are being built using the generally available Personality Insights service. For detailed information about using the service or about the research that was discussed in this post, see the Personality Insights documentation. Please provide feedback on the service or ask questions in the Watson community.

12 comments on"IBM Watson Personality Insights: The science behind the service"

  1. Amy Dolzine April 01, 2015

    Curious if this study can be extrapolated to behaviors in “internal social” networks (Enterprise Social) too?

    • Jalal Mahmud April 02, 2015

      Our published work uses data from external social media such as Twitter, Forum data, etc. However, we have started investigating applicability to enterprise social media. Our analysis of language use and member satisfaction in enterprise community appeared in CSCW 2015 (

  2. Does this study give weight to the domain knowledge that the psychological side of Big 5 requires?
    There are many lexical studies which spawn insights into Personality characteristics of a person. A work under publication (I might be able to link to say by next month) exploits these very psychological studies to capture domain knowledge into the predictive analysis.

    • Jalal Mahmud June 09, 2015

      We have not incorporated additional domain knowledge to infer personality insight characteristics. However, it will be interesting to explore personality insight with additional domain knowledge.

    • Hi Shivani,

      If you have the link to that paper you mentioned in your comment, I’d be really interested to read it. Thanks a lot!

  3. Would be interesting to run over corporate internal data (e-mail for example). Mixing it with internal changes would give a nice insight on the effect executive decisions have on internal departments.

    • Keely Wright June 09, 2015

      That is indeed an interesting use case for the Personality Insights service. Likewise, I can envision the service being used to analyze blog comments for internal (and external) blog announcements.

  4. Name *Vasant Kumar July 20, 2015

    Hi Jalal – good work! I have been playing with your PI a bit. After the initial aha! moments, I found a high degree of erratic results. Different text samples from the same author produce vastly different profiles e.g. artistic interests from 1% to 98% or even critical aspects like scoring for Extraversion going from 1% to 80%. I am familiar with the work of James Pennebaker and the foundations of PI from that but is Watson-PI in a state yet where it can offer actionable insights?

  5. Hello, can you please guide me? using watson theory how can i generate personality indices?

  6. karthikk.222 June 22, 2016

    Its amazing to see how new disruptive technologies can change the world, How theoretically generated data of various personality traits can closely match with experimentally verified data. But how accurate are the parameters based on which the theoretical data is evaluvated, some unknown factors which have not been considered which is currently beyond human imagination, how accurate have the participants been in evaluating themselves to generate the experimental data for comparison, are all questions i still ponder over. I’m sure time and experience can get us past some unknown loop holes which could maybe help us understand human nature, behavior and happiness factor to an extent than can have a strong positive impact on society

  7. ” A third study of more than 600 Twitter users found that a person’s personality characteristics can predict his or her brand preference with 65 percent accuracy. ”

    Can you tell us which study this was? Thank you.

Join The Discussion

Your email address will not be published. Required fields are marked *