Overview
This dataset addresses sentiment composition – predicting the sentiment of a phrase from the interaction between its constituents. For example, in the phrases “reduced bureaucracy” and “fresh injury”, both “reduced” and “fresh” are followed by a negative word. However, “reduced” flips the negative polarity, resulting in a positive phrase, while “fresh” propagates the negative polarity to the phrase level, resulting in a negative phrase. Accordingly, “reduced” is part of our “reversers” lexicon, and “fresh” is part of the “propagators” lexicon.
Dataset Metadata
Field | Value |
---|---|
Format | XLSX TXT |
License | CC-BY-SA 3.0 |
Domain | Natural Language Processing |
Number of Records | 2,783 words, 66,058 unigrams, 262,555 bigrams |
Data Split | 2,783 words, 66,058 unigrams, 262,555 bigrams |
Size | 10MB |
Author | Orith Toledo-Ronen, Roy Bar-Haim, Charles Jochim, Noam Slonim, Ranit Aharonov |
Dataset Origin | IBM Research Project Debater |
Dataset Version | 1.0.2 |
Data Coverage | N/A |
Dataset Archive Contents
File or Folder | Description |
---|---|
ReleaseNotes.docx |
release notes file describing the data |
SEMANTIC_CLASSES.xlsx |
the composition lexicons for reversers, propagators, and dominators |
ADJECTIVES.xlsx |
the composition lexicons for two gradable adjective pairs (high-low, fast-slow) and their expansion list |
LEXICON_UG.txt |
the unigrams sentiment lexicon |
LEXICON_BG.txt |
the bigrams sentiment lexicon |
Data Glossary and Preview
Click here to explore the data glossary, sample records, and additional dataset metadata.
Use the Dataset
This dataset is complemented by a data exploration and data analysis Python notebook to help you get started:
Related Links
- Project Debater Project Debater is the first AI system that can debate humans on complex topics. The goal is to help people build persuasive arguments and make well-informed decisions. This dataset contributed to training the models in Project Debater.
Citation
@article{sentiment_composition_lexicons,
author="Orith Toledo-Ronen
and Roy Bar-Haim
and Alon Halfon
and Amir Menczel
and Charles Jochim
and Noam Slonim
and Ranit Aharonov",
title="Learning Sentiment Composition from Sentiment Lexicons",
journal="COLING",
year="2018",
}