IBM Cloud Satellite: Run and manage services anywhere Learn more

IBM Debater® Recorded Debating #3


Engaging in a competitive debate requires Project Debater to effectively rebut arguments raised by the human opponent. The system must listen to an argumentative speech in real-time, understand the main arguments, and produce persuasive counter-arguments.

The nature of the argumentation domain and the characteristics of competitive debates make the understanding of such spoken content challenging. Expressed ideas often span multiple, non-consecutive sentences and many arguments are alluded to rather than explicitly stated. Further difficulty stems from the requirement to identify and rebut the most important parts of a speech that is several minutes long. This contrasts with today’s conversational agents, which aim at understanding a single functional command from short inputs. The goal of the Recorded Debates Dataset is to form a basis for the development of listening comprehension algorithms in this challenging setting.

Release #3 includes 400 speeches from 200 controversial topics, in the following format:

  • The recorded audio (wav files)
  • Text produced from the audio using an automatic speech recognition (ASR) system (text files)
  • A manually corrected transcript of the ASR text, created by expert annotators (text files)

Both the ASR and transcript texts are given in their raw form, designating also the time within the audio in which each utterance was said, and in another “NLP-friendly” clean version containing only the spoken words.

An additional annotation layer on top of the recorded speeches includes, for each speech, a list of mined claims, automatically mined from a large text corpus. The claims are annotated as mentioned, either explicitly, implicitly, or not at all, in the speech.

Dataset Metadata

Format License Domain Number of Records Size
CC-BY-SA 3.0 Natural Language Processing 400 speeches
20 controversial topics
4,876 annotated claims


author = {Tamar Lavee and Matan Orbach and Lili Kotlerman and Yoav Kantor and Shai Gretz and Lena Dankin and Shachar Mirkin and Michal Jacovi and Yonatan Bilu and Ranit Aharonov and Noam Slonim},
title = {Towards Effective Rebuttal: Listening Comprehension using Corpus-Wide Claim Mining},
journal = {CoRR},
volume = {abs/1907.11889},
year = {2019},
url = {},
archivePrefix = {arXiv},
eprint = {1907.11889},
timestamp = {Thu, 01 Aug 2019 08:59:33 +0200},
biburl = {},
bibsource = {dblp computer science bibliography,}
  • Project Debater Project Debater is the first AI system that can debate humans on complex topics. The goal is to help people build persuasive arguments and make well-informed decisions. This dataset contributed to training the models in Project Debater.