IBM Debater® Sentiment Composition Lexicons


This dataset addresses sentiment composition – predicting the sentiment of a phrase from the interaction between its constituents. For example, in the phrases “reduced bureaucracy” and “fresh injury”, both “reduced” and “fresh” are followed by a negative word. However, “reduced” flips the negative polarity, resulting in a positive phrase, while “fresh” propagates the negative polarity to the phrase level, resulting in a negative phrase. Accordingly, “reduced” is part of our “reversers” lexicon, and “fresh” is part of the “propagators” lexicon.

Dataset Metadata

Field Value
Format XLSX
License CC-BY-SA 3.0
Domain Natural Language Processing
Number of Records 2,783 words, 66,058 unigrams, 262,555 bigrams
Data Split 2,783 words, 66,058 unigrams, 262,555 bigrams
Size 10MB
Author Orith Toledo-Ronen, Roy Bar-Haim, Charles Jochim, Noam Slonim, Ranit Aharonov
Dataset Origin IBM Research Project Debater
Dataset Version 1.0.2
Data Coverage N/A

Dataset Archive Contents

File or Folder Description
ReleaseNotes.docx release notes file describing the data
SEMANTIC_CLASSES.xlsx the composition lexicons for reversers, propagators, and dominators
ADJECTIVES.xlsx the composition lexicons for two gradable adjective pairs (high-low, fast-slow) and their expansion list
LEXICON_UG.txt the unigrams sentiment lexicon
LEXICON_BG.txt the bigrams sentiment lexicon

Data Glossary and Preview

Click here to explore the data glossary, sample records, and additional dataset metadata.

Use the Dataset

This dataset is complemented by a data exploration and data analysis Python notebook to help you get started:

  • Project Debater Project Debater is the first AI system that can debate humans on complex topics. The goal is to help people build persuasive arguments and make well-informed decisions. This dataset contributed to training the models in Project Debater.


author="Orith Toledo-Ronen
and Roy Bar-Haim
and Alon Halfon
and Amir Menczel
and Charles Jochim
and Noam Slonim
and Ranit Aharonov",
title="Learning Sentiment Composition from Sentiment Lexicons",