The 3000 sentences are divided as follows.
– 1000 sentences taken from Wikipedia articles that discuss various topics, such as those in Debatabase (http://idebate.org/debatabase)
– 1000 sentences taken from professional speakers discussing some of those topics. Those sentences have two forms (thus resulting in 2000 sentence): the output of an Automatic Speech Recognition (ASR) engine; and a cleansed manual transcription of it.
There are total 6375 Mentions in the Wikipedia sentences and 6239 Mentions in the spoken sentences.
|Format||License||Domain||Number of Records||Size||Originally Published|
||CC-BY-SA 3.0||Natural Language Processing||
3,000 sentences and mentions
||1.8 MB||January 25, 2018|
history|||http://dbpedia.org/resource/History|||34|||41 prejudice|||http://dbpedia.org/resource/Prejudice|||60|||69 societal level|||http://dbpedia.org/resource/Social_structure|||75|||89 consequences|||http://dbpedia.org/resource/Consequentialism|||157|||169 consequences|||http://dbpedia.org/resource/Unintended_consequences|||157|||169 affirmative action|||http://dbpedia.org/resource/Affirmative_action|||241|||259
- IBM Project Debater Project Debater is the first AI system that can debate humans on complex topics. The goal is to help people build persuasive arguments and make well-informed decisions. This dataset contributed to training the models in Project Debater.
- Data Asset eXchange (DAX) Explore useful and relevant data sets for enterprise data science.
- Model Asset eXchange (MAX) A place for developers to find and use free and open source deep learning models.
- Center for Open-Source Data & AI Technologies (CODAIT) Improving the Enterprise AI Lifecycle in Open Source.