Overview

The 3000 sentences are divided as follows.

– 1000 sentences taken from Wikipedia articles that discuss various topics, such as those in Debatabase (http://idebate.org/debatabase)

– 1000 sentences taken from professional speakers discussing some of those topics. Those sentences have two forms (thus resulting in 2000 sentence): the output of an Automatic Speech Recognition (ASR) engine; and a cleansed manual transcription of it.

There are total 6375 Mentions in the Wikipedia sentences and 6239 Mentions in the spoken sentences.

Dataset Metadata

Format License Domain Number of Records Size Originally Published
ANN
CC-BY-SA 3.0 Natural Language Processing 3,000 sentences and mentions
1.8 MB January 25, 2018

Example Records

history|||http://dbpedia.org/resource/History|||34|||41
prejudice|||http://dbpedia.org/resource/Prejudice|||60|||69
societal level|||http://dbpedia.org/resource/Social_structure|||75|||89
consequences|||http://dbpedia.org/resource/Consequentialism|||157|||169
consequences|||http://dbpedia.org/resource/Unintended_consequences|||157|||169
affirmative action|||http://dbpedia.org/resource/Affirmative_action|||241|||259
  • IBM Project Debater Project Debater is the first AI system that can debate humans on complex topics. The goal is to help people build persuasive arguments and make well-informed decisions. This dataset contributed to training the models in Project Debater.