Now available! Red Hat OpenShift Container Platform for Linux on IBM Z and LinuxONE Learn more

IBM Debater® Mention Detection Benchmark

Overview

The 3000 sentences are divided as follows.

  • 1000 sentences taken from Wikipedia articles that discuss various topics, such as those in Debatabase
  • 1000 sentences taken from professional speakers discussing some of those topics. Those sentences have two forms (thus resulting in 2000 sentence): the output of an Automatic Speech Recognition (ASR) engine; and a cleansed manual transcription of it.

There are total 6375 Mentions in the Wikipedia sentences and 6239 Mentions in the spoken sentences.

Dataset Metadata

Format License Domain Number of Records Size Originally Published
ANN
CC-BY-SA 3.0 Natural Language Processing 3,000 sentences and mentions
1.8 MB 2018-01-25

Example Records

history|||http://dbpedia.org/resource/History|||34|||41
prejudice|||http://dbpedia.org/resource/Prejudice|||60|||69
societal level|||http://dbpedia.org/resource/Social_structure|||75|||89
consequences|||http://dbpedia.org/resource/Consequentialism|||157|||169
consequences|||http://dbpedia.org/resource/Unintended_consequences|||157|||169
affirmative action|||http://dbpedia.org/resource/Affirmative_action|||241|||259
  • IBM Project Debater Project Debater is the first AI system that can debate humans on complex topics. The goal is to help people build persuasive arguments and make well-informed decisions. This dataset contributed to training the models in Project Debater.