IBM Debater® Mention Detection Benchmark
The goal of Mention Detection is to map entities/concepts mentioned in text to the correct concept in a knowledge base. The dataset contains 3000 sentences that are annotated with Mentions.
The 3000 sentences are divided as follows.
- 1000 sentences taken from Wikipedia articles that discuss various topics, such as those in Debatabase
- 1000 sentences taken from professional speakers discussing some of those topics. Those sentences have two forms (thus resulting in 2000 sentence): the output of an Automatic Speech Recognition (ASR) engine; and a cleansed manual transcription of it.
There are total 6375 Mentions in the Wikipedia sentences and 6239 Mentions in the spoken sentences.
|Format||License||Domain||Number of Records||Size||Originally Published|
||CC-BY-SA 3.0||Natural Language Processing||3,000 sentences and mentions
history|||http://dbpedia.org/resource/History|||34|||41 prejudice|||http://dbpedia.org/resource/Prejudice|||60|||69 societal level|||http://dbpedia.org/resource/Social_structure|||75|||89 consequences|||http://dbpedia.org/resource/Consequentialism|||157|||169 consequences|||http://dbpedia.org/resource/Unintended_consequences|||157|||169 affirmative action|||http://dbpedia.org/resource/Affirmative_action|||241|||259
- IBM Project Debater Project Debater is the first AI system that can debate humans on complex topics. The goal is to help people build persuasive arguments and make well-informed decisions. This dataset contributed to training the models in Project Debater.