As explained in a previous blog, our service relies on psychometric survey-based scores. In short, to collect our ground truth data, we administer surveys over large populations and for each user we collect standard psychometric surveys along with their Twitter posts (more details here). For this release, we tripled the number of users in our ground truth, allowing us to get much more confident results.
Along with collecting more data, we changed the nature of our model. Previously, we guided the features in our model using Linguistic Inquiry and Word Count (LIWC) dictionary categories. In the newer model, we decided to have a more robust approach that keeps up with the latest and emerging vocabulary on social media. Therefore the newer model, presently only available in English language, eliminates LIWC dictionary all together and instead uses a vector representation of words (Word2Vec) derived from multiple large corpora called GloVe developed by Stanford University. GloVe is trained on aggregated global word-word co-occurrence statistics from a very large corpus, and the resulting representations capture semantic similarities and differences in the words. Using GloVe word vectors as features in our model allowed us to breach the performance gap of our model on short texts and even outperform our previous model on long texts.
To study the accuracy of our model, we performed correlation and Mean Absolute Error (MAE) analysis to compare the trait scores that were calculated by using our Personality Insights machine learning model with the corresponding psychometric measures collected from administering the surveys to all 1,500 subjects. We averaged the correlation and the MAE over all the personality traits and report them as a function of the number of words a user of our service would submit. The graphs below show the correlation and the MAE on y-axis and number of words on x-axis.
Those graphs show that our new model performs much better than the previous one for all text lengths. With 3,000 words, the new model provides 40% higher correlation than before. In fact, with just 450 words the new model provides the same correlation as the older one provided with 3000 words. This translates to an improved model (see Mean Absolute Error and other details here), which requires fewer number of words to infer personality. We recommend that you provide at least 1200 words of input text, which results in correlation within 90% of the best correlation and an MAE that is within two percent of the best MAE the service can return. Submitting between 600 and 1200 words results in an MAE that is within three percent of the best MAE (correlation within 80%) that IBM still considers acceptable. Providing at least 3000 words approaches the service’s maximum precision. As before, you must submit at least 100 words; otherwise, the service reports an error. If you submit fewer than 600 words, the service reports a warning but still analyzes the input text.
With this new model, Watson Personality Insights Service is able to offer an improved performance for users. Further, we have been able to reduce the number of words recommended for an acceptable performance by almost a factor of 3. The IBM Watson Personality Insights team is very excited about the capabilities this release will enable and is looking forward to your feedback. We are currently releasing this new model for English. Other languages i.e. Spanish, Japanese and Arabic, will shortly follow.
Given that this is a new model, the results it produces are different from those of the previous one. We remind our users that while results may not tally on an individual basis between the new and old models, the overall performance of the new model as measured by both correlation and MAE is better than the old. Therefore, users should see better results overall with this new model. The users/clients who cache the personality results, have a choice to make on whether they would like to keep the old results as-is or to redo them with the newer model. Our recommendation is that if you are doing a population analysis for use cases such as customer segmentation based on personality and/or adding personality traits in prediction models, then the personality for all users in the population should be calculated from the same model. In either case, users are encouraged to understand that given the complexity of predicting one’s personality traits, in general, with proxies such as text, some amount of errors are unavoidable. However, we are pleased to note that our new model is better than the old one.
We are continuously improving our models and will keep you abreast of any future updates. The development team responsible for Personality Insights includes: Pierre Arnoux, Rama Akkiraju, Neil Boyette, Jalal Mahmud and Vibha Sinha. Kenneth R Kuo is the offering manager. Steffi Diamond is the release manager.