2021 Call for Code Awards: Live from New York, with SNL’s Colin Jost! Learn more

IBM Debater® Sentiment Lexicon of IDiomatic Expressions (SLIDE)


SLIDE (Sentiment Lexicon of IDiomatic Expressions) is a resource for sentiment analysis, created via crowdsourcing. The lexicon includes 5,000 frequently occurring idioms, as estimated from a large English corpus. The idioms were selected from Wiktionary, and over 40% of them were labeled as sentiment-bearing. Each idiom was annotated as positive, negative, neutral or inappropriate by at least ten annotators. The lexicon includes a sentiment label along with the distribution of sentiment annotations. Our labels are assigned by taking the label with the greatest number of votes from the crowdsourced annotation. In the case of ties between positive (or negative) and neutral, the label is positive (resp. negative). In the rare cases of ties between positive and negative, we use the neutral label. The resulting lexicon has 946 positive idioms, 1,108 negative, 2,945 neutral, and 1 inappropriate.

The released data file has 12 columns:

  • Column A: Idiom expression
  • Column B: Link to idiom in Wiktionary
  • Column C: Count of positive annotation
  • Column D: Count of negative annotation
  • Column E: Count of neutral annotation
  • Column F: Count of annotation where the expression was deemed vulgar or inappropriate
  • Column G: Total annotations
  • Column H: Percent positive
  • Column I: Percent negative
  • Column J: Percent neutral
  • Column K: Sentiment label
  • Column L: Ambiguous expression filter — ‘X’ indicates removal (see paper, Section 4)

Dataset Metadata

Format License Domain Number of Records Size
CC-BY-SA 3.0 Natural Language Processing 5000 sentiment-annotated idioms

Example Records

alive and kicking    https://en.wiktionary.org/wiki/alive_and_kicking    10    0    0    0    10    1.000    0.000    0.000    positive


title = "{SLIDE}---a Sentiment Lexicon of Common Idioms",
author = "Jochim, Charles and Bonin, Francesca and Bar-Haim, Roy and Slonim, Noam",
booktitle = "Proceedings of the Eleventh International Conference on Language Resources and Evaluation ({LREC}-2018)",
month = may,
year = "2018",
address = "Miyazaki, Japan",
publisher = "European Languages Resources Association (ELRA)",
url = "https://www.aclweb.org/anthology/L18-1379",
  • Project Debater Project Debater is the first AI system that can debate humans on complex topics. The goal is to help people build persuasive arguments and make well-informed decisions. This dataset contributed to training the models in Project Debater.