Overview

The dataset contains 2485 paragraphs, which comprise of 4002 sentences from audience-addressed speeches, compiled from Wikipedia 2012.

The paragraphs were given to a professionally native female US English speaker, that was instructed to read in a persuasive and lively manner. In turn, based on the recorded speech, each sentence was annotated by 4 professional labelers for emphasized words. The labelers got the original text and the recorded data with the following guidelines.

i) Emphasized words are words that clearly stand out in the speech relatively to most of the words in the sentence.

ii) Label only based on the speech, and not based on the importance of the word in the text.

Dataset Metadata

Format License Domain Number of Records Size Originally Published
TXT
CC-BY-SA 3.0 Natural Language Processing 2485 sentences with emphasized words
3.3 MB September 02, 2018

Example Records

Exposure to  violent  video games causes at least a temporary increase in  aggression  and this exposure correlates with  aggression  in the real world.
A single child would be left with having to provide support for his or her  two  parents and  four  grandparents.
The pursuit of  doping  athletes has turned into a modern day  witch - hunt .
  • IBM Project Debater Project Debater is the first AI system that can debate humans on complex topics. The goal is to help people build persuasive arguments and make well-informed decisions. This dataset contributed to training the models in Project Debater.